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Gene Expression Markers for Breast Cancer Prognosis 



Background of the Invention 

[0001] This application claims priority under 35 U.S.C. § 1 19(e) to provisional 
application Serial No. 60/440,661 filed on January 15, 2003, the entire disclosure of 
which is hereby expressly incorporated by reference. 

Field of the Invention 

[0002] The present invention provides genes and gene sets the expression of 
which is important in the diagnosis and/or prognosis of breast cancer. 

Description of the Related Art 

[0003] Oncologists have a number of treatment options available to them, 
including different combinations of chemotherapeutic drugs that are characterized as 
"standard of care," and a number of drugs that do not carry a label claim for particular 
cancer, but for which there is evidence of efficacy in that cancer. Best likelihood of good 
treatment outcome requires that patients be assigned to optimal available cancer 
treatment, and that this assignment be made as quickly as possible following diagnosis. 

[0004] Currently, diagnostic tests used in clinical practice are single analyte, 
and therefore do not capture the potential value of knowing relationships between dozens 
of different markers. Moreover, diagnostic tests are frequently not quantitative, relying 
on immunohistochemistry. This method often yields different results in different 
laboratories, in part because the reagents are not standardized, and in part because the 
interpretations are subjective and cannot be easily quantified. RNA-based tests have not 
often been used because of the problem of RNA degradation over time and the fact that it 
is difficult to obtain fresh tissue samples from patients for analysis. Fixed paraffin- 
embedded tissue is more readily available and methods have been established to detect 
RNA in fixed tissue. However, these methods typically do not allow for the study of 
large numbers of genes (DNA or RNA) from small amounts of material. Thus, 
traditionally fixed tissue has been rarely used other than for immunohistochemistry 
detection of proteins. 
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[0005] Recently, several groups have published studies concerning the 
classification of various cancer types by microarray gene expression analysis (see, e.g. 
Golub et al, Science 286:531-537 (1999); Bhattacharjae et al, Proc. Natl Acad. ScL 
USA 98:13790-13795 (2001); Chen-Hsiang et al, Bioinformatics 17 (Suppl. 1):S316- 
S322 (2001); Ramaswamy et al, Proc. Natl Acad. ScL USA 98:15149-15154 (2001)). 
Certain classifications of human breast cancers based on gene expression patterns have 
also been reported (Martin et al, Cancer Res. 60:2232-2238 (2000); West et al, Proc. 
Natl Acad. ScL USA 98:11462-11467 (2001); Sorlie et al, Proc. Natl Acad. ScL USA 
98:10869-10874 (2001); Yan et al, Cancer Res. 61:8375-8380 (2001)). However, these 
studies mostly focus on improving and refining the already established classification of 
various types of cancer, including breast cancer, and generally do not provide new 
insights into the relationships of the differentially expressed genes, and do not link the 
findings to treatment strategies in order to improve the clinical outcome of cancer 
therapy. 

[0006] Although modern molecular biology and biochemistry have revealed 
hundreds of genes whose activities influence the behavior of tumor cells, state of their 
differentiation, and their sensitivity or resistance to certain therapeutic drugs, with a few 
exceptions, the status of these genes has not been exploited for the purpose of routinely 
making clinical decisions about drug treatments. One notable exception is the use of 
estrogen receptor (ER) protein expression in breast carcinomas to select patients to 
treatment with anti-estrogen drugs, such as tamoxifen. Another exceptional example is 
the use of ErbB2 (Her2) protein expression in breast carcinomas to select patients with 
the Her2 antagonist drug Herceptin® (Genentech, Inc., South San Francisco, CA). 

[0007] Despite recent advances, the challenge of cancer treatment remains to 
target specific treatment regimens to pathogenically distinct tumor types, and ultimately 
personalize tumor treatment in order to maximize outcome. Hence, a need exists for tests 
that simultaneously provide predictive information about patient responses to the variety 
of treatment options. This is particularly true for breast cancer, the biology of which is 
poorly understood. It is clear that the classification of breast cancer into a few subgroups, 
such as ErbB2 + subgroup, and subgroups characterized by low to absent gene expression 
of the estrogen receptor (ER) and a few additional transcriptional factors (Perou et al, 
Nature 406:747-752 (2000)) does not reflect the cellular and molecular heterogeneity of 
breast cancer, and does not allow the design of treatment strategies maximizing patient 
response. 
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Summary of the Invention 

[0008] The present invention provides a set of genes, the expression of which 
has prognostic value, specifically with respect to disease-free survival. 

[0009] The present invention accommodates the use of archived paraffin- 
embedded biopsy material for assay of all markers in the set, and therefore is compatible 
with the most widely available type of biopsy material. It is also compatible with several 
different methods of tumor tissue harvest, for example, via core biopsy or fine needle 
aspiration. Further, for each member of the gene set, the invention specifies 
oligonucleotide sequences that can be used in the test. 

[0010] In one aspect, the invention concerns a method of predicting the 
likelihood of long-term survival of a breast cancer patient without the recurrence of breast 
cancer, comprising determining the expression level of one or more prognostic RNA 
transcripts or their expression products in a breast cancer tissue sample obtained from the 
patient, normalized against the expression level of all RNA transcripts or their products in 
the breast cancer tissue sample, or of a reference set of RNA transcripts or their 
expression products, wherein the prognostic RNA transcript is the transcript of one or 
more genes selected from the group consisting of: TP53BP2, GRB7, PR, CD68, Bcl2, 
KRT14, IRS1, CTSL, EstRl, Chkl, IGFBP2, BAG1, CEGP1, STK15, GSTM1, FHIT, 
RIZ1, AIB1, SURV, BBC3, IGF1R, p27, GAT A3, ZNF217, EGFR, CD9, MYBL2, 
HIFla, pS2, ErbB3, TOP2B, MDM2, RAD51C, KRT19, TS, Her2, KLK10, p-Catenin, y- 
Catenin, MCM2, PI3KC2A, IGF1, TBP, CCNB1, FBXOS, and DR5, 

wherein expression of one or more of GRB7, CD68, CTSL, Chkl, AIB1, 
CCNB1, MCM2, FBXOS, Her2, STK15, SURV, EGFR, MYBL2, HIFla, and TS 
indicates a decreased likelihood of long-term survival without breast cancer recurrence, 
and 

the expression of one or more of TP53BP2, PR, Bcl2, KRT14, EstRl, IGFBP2, 
BAG1, CEGP1, KLK10, P-Catenin, y-Catenin, DR5, PI3KCA2, RAD51C, GSTM1, 
FHIT, RIZ1, BBC3, TBP, p27, IRS1, IGF1R, GAT A3, ZNF217, CD9, pS2, ErbB3, 
TOP2B, MDM2, IGF1, and KRT19 indicates an increased likelihood of long-term 
survival without breast cancer recurrence. 

[0011] In a particular embodiment, the expression levels of at least two, or at 
least 5, or at least 10, or at least 15 of the prognostic RNA transcripts or their expression 
products are determined. In another embodiment, the method comprises the 



determination of the expression levels of all prognostic RNA transcripts or their 
expression products. 

[0012] In another particular embodiment, the breast cancer is invasive breast 
carcinoma. 

[0013] In a further embodiment, RNA is isolated from a fixed, wax-embedded 
breast cancer tissue specimen of the patient. Isolation may be performed by any 
technique known in the art, for example from core biopsy tissue or fine needle aspirate 
cells. 

[0014] In another aspect, the invention concerns an array comprising 
polynucleotides hybridizing to two or more of the following genes: a-Catenin, AEB1, 
AKT1, AKT2, |3-actin, BAG1, BBC3, Bcl2, CCNB1, CCND1, CD68, CD9, CDH1, 
CEGP1, Chkl, CIAP1, cMet.2, Contig 27882, CTSL, DR5, EGFR, EIF4E, EPHX1, 
ErbB3, EstRl, FBX05, FHIT1 FRP1, GAPDH, GAT A3, G-Catenin, GRB7, GROl, 
GSTM1, GUS, HER2, HIF1A, HNF3A, IGF1R, IGFBP2, KLK10, KRT14, KRT17, 
KRT18, KRT19, KRT5, Maspin, MCM2, MCM3, MDM2, MMP9, MTA1, MYBL2, 
P14ARF, p27, P53, PI3KC2A, PR, PRAME, pS2, RAD51C,.3RB1, RIZ1, STK15, 
STMY3, SURV, TGFA, TOP2B, TP53BP2, TRAIL, TS, upa, VDR, VEGF, and ZNF217. 

[0015] In particular embodiments, the array comprises polynucleotides 
hybridizing to at least 3, or at least 5, or at least 10, or at least 15, or at least 20, or all of 
the genes listed above. 

[0016] In another specific embodiment, the array comprises polynucleotides 
hybridizing to the following genes: TP53BP2, GRB7, PR, CD68, Bcl2, KRT14, IRS1, 
CTSL, EstRl, Chkl, IGFBP2, BAG1, CEGP1, STK15, GSTM1, FHIT, RIZ1, AIB1, 
SURV, BBC3, IGF1R, p27, GATA3, ZNF217, EGFR, CD9, MYBL2, HIFla, pS2, RIZ1, 
ErbB3, TOP2B, MDM2, RAD51C, KRT19, TS, Her2, KLK10, p-Catenin, y-Catenin, 
MCM2, PI3KC2A, IGF1, TBP, CCNB1, FBXOS and DR5. 

[0017] The polynucleotides can be cDNAs, or oligonucleotides, and the solid 
surface on which they are displayed may, for example, be glass. 

[0018] In another aspect, the invention concerns a method of predicting the 
likelihood of long-term survival of a patient diagnosed with invasive breast cancer, 
without the recurrence of breast cancer, comprising the steps of: 

(1) determining the expression levels of the RNA transcripts or the expression 
products of genes or a gene set selected from the group consisting of 
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(a) TP53BP2, Bcl2, BAD, EPHX1, PDGFRp", DIABLO, XIAP, YB1, CA9, and 
KRT8; 

(b) GRB7, CD68, TOP2A, Bcl2, DIABLO, CD3, ID1, PPM1D, MCM6, and WISP1 ; 

(c) PR, TP53BP2, PRAME, DIABLO, CTSL, IGFBP2, TIMP1, CA9, MMP9, and 
COX2; 

(d) CD68, GRB7, TOP2A, Bcl2, DIABLO, CD3, ID1, PPM1D, MCM6, and WISP1 ; 

(e) Bcl2, TP53BP2, BAD, EPHX1, PDGFR0, DIABLO, XIAP, YB1, CA9, and 
KRT8; 

(f) KRT14, KRT5, PRAME, TP53BP2, GUS1, AIB1, MCM3, CCNE1, MCM6, and 
ID1; 

(g) PRAME, TP53BP2, EstRl, DIABLO, CTSL, PPM1D, GRB7, DAPK1, BBC3, 
and VEGFB; 

(h) CTSL2, GRB7, TOP2A, CCNB1, Bcl2, DIABLO, PRAME, EMS1, CA9, and 
EpCAM; 

(i) EstRl, TP53BP2, PRAME, DIABLO, CTSL, PPM1D, GRB7, DAPK1, BBC3, 
and VEGFB; 

(k) Chkl, PRAME, TP53BP2, GRB7, CA9, CTSL, CCNB1, TOP2A, tumor size, and 
IGFBP2; 

(1) IGFBP2, GRB7, PRAME, DIABLO, CTSL, p-Catenin, PPM1D, Chkl, WISP1, 
and LOT 1; 

(m) HER2, TP53BP2, Bcl2, DIABLO, TIMP1, EPHX1, TOP2A, TRAIL, CA9, and 
AREG; 

(n) BAG1, TP53BP2, PRAME, IL6, CCNB1, PAI1, AREG, tumor size, CA9, and 
Ki67; 

(o) CEGP1, TP53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, STK15, and 
AKT2, and FGF18; 

(p) STK15, TP53BP2, PRAME, IL6, CCNE1, AKT2, DIABLO, cMet, CCNE2, and 
COX2; 

(q) KLK10, EstRl, TP53BP2, PRAME, DIABLO, CTSL, PPM1D, GRB7, DAPK1, 
and BBC3; 

(r) AIB1, TP53BP2, Bcl2, DIABLO, TIMP1, CD3, p53, CA9, GRB7, and EPHX1 
(s) BBC3, GRB7, CD68, PRAME, TOP2A, CCNB1, EPHX1, CTSL 
GSTM1, and APC; 

(t) CD9, GRB7, CD68, TOP2A, Bcl2, CCNB1, CD3, DIABLO, ID 1, and PPM ID; 
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(w) EGFR, KRT14, GRB7, TOP2A, CCNB1, CTSL, Bcl2, TP, KLK10, and CA9; 
(x) HIFlct, PR, DIABLO, PRAME, Chkl, AKT2, GRB7, CCNE1, TOP2A, and 
CCNB1; 

(y) MDM2, TP53BP2, DIABLO, Bcl2, AIB1, TIMP1, CD3, p53, CA9, and HER2; 
(z) MYBL2, TP53BP2, PRAME, IL6, Bcl2, DIABLO, CCNE1, EPHX1, TIMP1, and 
CA9; 

(aa) p27, TP53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, STK15, AKT2, and 
EDI; 

(ab) RAD51, GRB7, CD68, TOP2A, CIAP2, CCNB1, BAG1, IL6, FGFR1, and 
TP53BP2; 

(ac) SURV, GRB7, TOP2A, PRAME, CTSL, GSTM1, CCNB1, VDR, CA9; and 
CCNE2; 

(ad) TOP2B, TP53BP2, DIABLO, Bcl2, TIMP1, AIB1, CA9, p53, KRT8, and BAD; 

(ae) ZNF217, GRB7, TP53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, APC4, 
and P-Catenin, 

in a breast cancer tissue sample obtained from the patient, normalized against the 
expression levels of all RNA transcripts or their expression products in said breast cancer 
tissue sample, or of a reference set of RNA transcripts or their products; 

(2) subjecting the data obtained in step (1) to statistical analysis; and 

(3) determining whether the likelihood of said long-term survival has 
increased or decreased. 

[0019] In a further aspect, the invention concerns a method of predicting the 
likelihood of long-term survival of a patient diagnosed with estrogen receptor (ER)- 
positive invasive breast cancer, without the recurrence of breast cancer, comprising the 
steps of: 

(1) determining the expression levels of the RNA transcripts or the 

expression products of genes of a gene set selected from the group consisting of CD68; 
CTSL; FBX05; SURV; CCNB1; MCM2; Chkl; MYBL2; HIF1A; cMET; EGFR; TS; 
STK15, IGFR1; BC12; HNF3A; TP53BP2; GAT A3; BBC3; RAD51C; BAG1; IGFBP2; 
PR; CD9; RBI; EPHX1; CEGP1; TRAIL; DR5; p27; p53; MTA; RIZ1; ErbB3; TOP2B; 
EEF4E, wherein expression of the following genes in ER-positive cancer is indicative of a 
reduced likelihood of survival without cancer recurrence following surgery: CD68; 
CTSL; FBX05; SURV; CCNB1; MCM2; Chkl; MYBL2; HIF1A; cMET; EGFR; TS; 
STK15, and wherein expression of the following genes is indicative of a better prognosis 
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for survival without cancer recurrence following surgery: IGFR1; BC12; HNF3A; 
TP53BP2; GAT A3; BBC3; RAD51C; BAG1; IGFBP2; PR; CD9; RBI; EPHX1; CEGP1; 
TRAIL; DR5; p27; p53; MTA; RIZ1; ErbB3; TOP2B; EIF4E. 

(2) subjecting the data obtained in step (1) to statistical analysis; and 

(3) determining whether the likelihood of said long-term survival has 
increased or decreased. 

[0020] In yet another aspect, the invention concerns a method of predicting the 
likelihood of long-term survival of a patient diagnosed with estrogen receptor (ER)- 
negative invasive breast cancer, without the recurrence of breast cancer, comprising 
determining the expression levels of the RNA transcripts or the expression products of 
genes of the gene set CCND1; UPA; HNF3A; CDH1; Her2; GRB7; AKT1; STMY3; a- 
Catenin; VDR; GROl; KT14; KLK10; Maspin, TGFa, and FRP1, wherein expression of 
the following genes is indicative of a reduced likelihood of survival without cancer 
recurrence: CCND1; UPA; HNF3A; CDH1; Her2; GRB7; AKT1; STMY3; a-Catenin; 
VDR; GROl, and wherein expression of the following genes is indicative of a better 
prognosis for survival without cancer recurrence: KT14; KLK10; Maspin, TGFa, and 
FRP1. 

[0021] In a different aspect, the invention concerns a method of preparing a 
personalized genomics profile for a patient, comprising the steps of: 

(a) subjecting RNA extracted from a breast tissue obtained from the patient to 
gene expression analysis; 

(b) determining the expression level of one or more genes selected from the 
breast cancer gene set listed in any one of Tables 1-5, wherein the expression level is 
normalized against a control gene or genes and optionally is compared to the amount 
found in a breast cancer reference tissue set; and 

(c) creating a report summarizing the data obtained by the gene expression 
analysis. 

[0022] The report may, for example, include prediction of the likelihood of 
long term survival of the patient and/or recommendation for a treatment modality of said 
patient. 

[0023] In a further aspect, the invention concerns a method for amplification 
of a gene listed in Tables 5A and B by polymerase chain reaction (PCR), comprising 
performing said PCR by using an amplicon listed in Tables 5A and B and a primer-probe 
set listed in Tables 6A-F. 
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[0024] In a still further aspect, the invention concerns a PCR amplicon listed 
in Tables 5A and B. 

[0025] In yet another aspect, the invention concerns a PCR primer-probe set 
listed in Tables 6A-F. 

[0026] The invention further concerns a prognostic method comprising: 

(a) subjecting a sample comprising breast cancer cells obtained from a patient 
to quantitative analysis of the expression level of the RNA transcript of at least one gene 
selected from the group consisting of GRB7, CD68, CTSL, Chkl, AIB1, CCNB1, 
MCM2, FBXOS, Her2, STK15, SURV, EGFR, MYBL2, HIFla, and TS, or their product, 
and 

(b) identifying the patient as likely to have a decreased likelihood of long-term 
survival without breast cancer recurrence if the normalized expression levels of the gene 
or genes, or their products, are elevated above a defined expression threshold. 

[0027] In a different aspect, the invention concerns a prognostic method 
comprising: 

(a) subjecting a sample comprising breast cancer cells obtained from a patient 
to quantitative analysis of the expression level of the RNA transcript of at least one gene 
selected from the group consisting of TP53BP2, PR, Bcl2, KRT14, EstRl, IGFBP2, 
BAG1, CEGP1, KLK10, p-Catenin, y-Catenin, DR5, PI3KCA2, RAD51C, GSTM1, 
FHIT, RIZ1, BBC3, TBP, p27, IRS1, IGF1R, GAT A3, ZNF217, CD9, pS2, ErbB3, 
TOP2B, MDM2, IGF1, and KRT19, and 

(b) identifying the patient as likely to have an increased likelihood of long- 
term survival without breast cancer recurrence if the normalized expression levels of the 
gene or genes, or their products, are elevated above a defined expression threshold. 

[0028] The invention further concerns a kit comprising one or more of (1) 
extraction buffer/reagents and protocol; (2) reverse transcription buffer/reagents and 
protocol; and (3) qPCR buffer/reagents and protocol suitable for performing any of the 
foregoing methods. 



Brief Description of the Drawings 

[0029] Table 1 is a list of genes, expression of which correlate with breast 
cancer survival. Results from a retrospective clinical trial. Binary statistical analysis. 

[0030] Table 2 is a list of genes, expression of which correlates with breast 
cancer survival in estrogen receptor (ER) positive patients. Results from a retrospective 
clinical trial. Binary statistical analysis. 

[0031] Table 3 is a list of genes, expression of which correlates with breast 
cancer survival in estrogen receptor (ER) negative patients. Results from a retrospective 
clinical trial. Binary statistical analysis. 

[0032] Table 4 is a list of genes, expression of which correlates with breast 
cancer survival. Results from a retrospective clinical trial. Cox proportional hazards 
statistical analysis. 

[0033] Tables 5A and B show a list of genes, expression of which correlate 
with breast cancer survival. Results from a retrospective clinical trial. The table includes 
accession numbers for the genes, and amplicon sequences used for PCR amplification. 

[0034] Tables 6A-6F The table includes sequences for the forward and reverse 
primers (designated by "f * and "r", respectively) and probes (designated by "p") used for 
PCR amplification of the amplicons listed in Tables 5A-B. 

Detailed Description of the Preferred Embodiment 

A. Definitions 

[0035] Unless defined otherwise, technical and scientific terms used herein 
have the same meaning as commonly understood by one of ordinary skill in the art to 
which this invention belongs. Singleton et aL 9 Dictionary of Microbiology and Molecular 
Biology 2nd ed., J. Wiley & Sons (New York, NY 1994), and March, Advanced Organic 
Chemistry Reactions, Mechanisms and Structure 4th ed., John Wiley & Sons (New York, 
NY 1992), provide one skilled in the art with a general guide to many of the terms used in 
the present application. 

[0036] One skilled in the art will recognize many methods and materials 
similar or equivalent to those described herein, which could be used in the practice of the 
present invention. Indeed, the present invention is in no way limited to the methods and 
materials described. For purposes of the present invention, the following terms are 
defined below. 



[0037] The term "microarray" refers to an ordered arrangement of hybridizable 
array elements, preferably polynucleotide probes, on a substrate. 

[0038] The term "polynucleotide," when used in singular or plural, generally 
refers to any polyribonucleotide or polydeoxribonucleotide, which may be unmodified 
RNA or DNA or modified RNA or DNA. Thus, for instance, polynucleotides as defined 
herein include, without limitation, single- and double-stranded DNA, DNA including 
single- and double-stranded regions, single- and double-stranded RNA, and RNA 
including single- and double-stranded regions, hybrid molecules comprising DNA and 
RNA that may be single-stranded or, more typically, double-stranded or include single- 
and double-stranded regions. In addition, the term "polynucleotide" as used herein refers 
to triple-stranded regions comprising RNA or DNA or both RNA and DNA. The strands 
in such regions may be from the same molecule or from different molecules. The regions 
may include all of one or more of the molecules, but more typically involve only a region 
of some of the molecules. One of the molecules of a triple-helical region often is an 
oligonucleotide. The term "polynucleotide" specifically includes cDNAs. The term 
includes DNAs (including cDNAs) and RNAs that contain one or more modified bases. 
Thus, DNAs or RNAs with backbones modified for stability or for other reasons are 
"polynucleotides" as that term is intended herein. Moreover, DNAs or RNAs comprising 
unusual bases, such as inosine, or modified bases, such as tritiated bases, are included 
within the term "polynucleotides" as defined herein. In general, the term 
"polynucleotide" embraces all chemically, enzymatically and/or metabolically modified 
forms of unmodified polynucleotides, as well as the chemical forms of DNA and RNA 
characteristic of viruses and cells, including simple and complex cells. 

[0039] The term "oligonucleotide" refers to a relatively short polynucleotide, 
including, without limitation, single-stranded deoxyribonucleotides, single- or double- 
stranded ribonucleotides, RNA:DNA hybrids and double-stranded DNAs. 
Oligonucleotides, such as single-stranded DNA probe oligonucleotides, are often 
synthesized by chemical methods, for example using automated oligonucleotide 
synthesizers that are commercially available. However, oligonucleotides can be made by 
a variety of other methods, including in vitro recombinant DNA-mediated techniques and 
by expression of DNAs in cells and organisms. 

[0040] The terms "differentially expressed gene," "differential gene 
expression" and their synonyms, which are used interchangeably, refer to a gene whose 
expression is activated to a higher or lower level in a subject suffering from a disease, 



-10- 



specifically cancer, such as breast cancer, relative to its expression in a normal or control 
subject. The terms also include genes whose expression is activated to a higher or lower 
level at different stages of the same disease. It is also understood that a differentially 
expressed gene may be either activated or inhibited at the nucleic acid level or protein 
level, or may be subject to alternative splicing to result in a different polypeptide product. 
Such differences may be evidenced by a change in mRNA levels, surface expression, 
secretion or other partitioning of a polypeptide, for example. Differential gene expression 
may include a comparison of expression between two or more genes or their gene 
products, or a comparison of the ratios of the expression between two or more genes or 
their gene products, or even a comparison of two differently processed products of the 
same gene, which differ between normal subjects and subjects suffering from a disease, 
specifically cancer, or between various stages of the same disease. Differential 
expression includes both quantitative, as well as qualitative, differences in the temporal or 
cellular expression pattern in a gene or its expression products among, for example, 
normal and diseased cells, or among cells which have undergone different disease events 
or disease stages. For the purpose of this invention, "differential gene expression" is 
considered to be present when there is at least an about two-fold, preferably at least about 
four-fold, more preferably at least about six-fold, most preferably at least about ten-fold 
difference between the expression of a given gene in normal and diseased subjects, or in 
various stages of disease development in a diseased subject. 

[0041] The phrase "gene amplification" refers to a process by which multiple 
copies of a gene or gene fragment are formed in a particular cell or cell line. The 
duplicated region (a stretch of amplified DNA) is often referred to as "amplicon." 
Usually, the amount of the messenger RNA (mRNA) produced, i.e., the level of gene 
expression, also increases in the proportion of the number of copies made of the particular 
gene expressed. 

[0042] The term "diagnosis" is used herein to refer to the identification of a 
molecular or pathological state, disease or condition, such as the identification of a 
molecular subtype of head and neck cancer, colon cancer, or other type of cancer. 

[0043] The term "prognosis" is used herein to refer to the prediction of the 
likelihood of cancer-attributable death or progression, including recurrence, metastatic 
spread, and drug resistance, of a neoplastic disease, such as breast cancer. 

[0044] The term "prediction" is used herein to refer to the likelihood that a 
patient will respond either favorably or unfavorably to a drug or set of drugs, and also the 
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extent of those responses, or that a patient will survive, following surgical removal or the 
primary tumor and/or chemotherapy for a certain period of time without cancer 
recurrence. The predictive methods of the present invention can be used clinically to 
make treatment decisions by choosing the most appropriate treatment modalities for any 
particular patient. The predictive methods of the present invention are valuable tools in 
predicting if a patient is likely to respond favorably to a treatment regimen, such as 
surgical intervention, chemotherapy with a given drug or drug combination, and/or 
radiation therapy, or whether long-term survival of the patient, following sugery and/or 
termination of chemotherapy or other treatment modalities is likely. 

[0045] The term "long-term" survival is used herein to refer to survival for at 
least 3 years, more preferably for at least 8 years, most preferably for at least 10 years 
following surgery or other treatment. 

[0046] The term "tumor," as used herein, refers to all neoplastic cell growth 
and proliferation, whether malignant or benign, and all pre-cancerous and cancerous cells 
and tissues. 

[0047] The terms "cancer" and "cancerous" refer to or describe the 
physiological condition in mammals that is typically characterized by unregulated cell 
growth. Examples of cancer include but are not limited to, breast cancer, colon cancer, 
lung cancer, prostate cancer, hepatocellular cancer, gastric cancer, pancreatic cancer, 
cervical cancer, ovarian cancer, liver cancer, bladder cancer, cancer of the urinary tract, 
thyroid cancer, renal cancer, carcinoma, melanoma, and brain cancer. 

[0048] The "pathology" of cancer includes all phenomena that compromise the 
well-being of the patient. This includes, without limitation, abnormal or uncontrollable 
cell growth, metastasis, interference with the normal functioning of neighboring cells, 
release of cytokines or other secretory products at abnormal levels, suppression or 
aggravation of inflammatory or immunological response, neoplasia, premalignancy, 
malignancy, invasion of surrounding or distant tissues or organs, such as lymph nodes, 
etc. 

[0049] "Stringency" of hybridization reactions is readily determinable by one 
of ordinary skill in the art, and generally is an empirical calculation dependent upon probe 
length, washing temperature, and salt concentration. In general, longer probes require 
higher temperatures for proper annealing, while shorter probes need lower temperatures. 
Hybridization generally depends on the ability of denatured DNA to reanneal when 
complementary strands are present in an environment below their melting temperature. 
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The higher the degree of desired homology between the probe and hybridizable sequence, 
the higher the relative temperature which can be used. As a result, it follows that higher 
relative temperatures would tend to make the reaction conditions more stringent, while 
lower temperatures less so. For additional details and explanation of stringency of 
hybridization reactions, see Ausubel et al., Current Protocols in Molecular Biology , 
Wiley Interscience Publishers, (1995). 

[0050] "Stringent conditions" or "high stringency conditions", as defined 
herein, typically: (1) employ low ionic strength and high temperature for washing, for 
example 0.015 M sodium chloride/0.0015 M sodium citrate/0.1% sodium dodecyl sulfate 
at 50°C; (2) employ during hybridization a denaturing agent, such as formamide, for 
example, 50% (v/v) formamide with 0.1% bovine serum albumin/0.1% Ficoll/0.1% 
polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 750 mM sodium 
chloride, 75 mM sodium citrate at 42°C; or (3) employ 50% formamide, 5 x SSC (0.75 M 
NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH 6.8), 0.1% sodium 
pyrophosphate, 5 x Denhardt's solution, sonicated salmon sperm DNA (50 jxg/ml), 0.1% 
SDS, and 10% dextran sulfate at 42°C, with washes at 42°C in 0.2 x SSC (sodium 
chloride/sodium citrate) and 50% formamide at 55°C, followed by a high-stringency wash 
consisting of 0.1 x SSC containing EDTA at 55°C. 

[0051] "Moderately stringent conditions" may be identified as described by 
Sambrook et al., Molecular Cloning: A Laboratory Manual , New York: Cold Spring 
Harbor Press, 1989, and include the use of washing solution and hybridization conditions 
(e.g., temperature, ionic strength and %SDS) less stringent that those described above. 
An example of moderately stringent conditions is overnight incubation at 37°C in a 
solution comprising: 20% formamide, 5 x SSC (150 mM NaCl, 15 mM trisodium citrate), 
50 mM sodium phosphate (pH 7.6), 5 x Denhardt's solution, 10% dextran sulfate, and 20 
mg/ml denatured sheared salmon sperm DNA, followed by washing the filters in 1 x SSC 
at about 37-50°C. The skilled artisan will recognize how to adjust the temperature, ionic 
strength, etc. as necessary to accommodate factors such as probe length and the like. 

[0052] In the context of the present invention, reference to "at least one," "at 
least two," "at least five," etc. of the genes listed in any particular gene set means any one 
or any and all combinations of the genes listed. 

[0053] The terms "expression threshold," and "defined expression threshold" 
are used interchangeably and refer to the level of a gene or gene product in question 
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above which the gene or gene product serves as a predictive marker for patient survival 
without cancer recurrence. The threshold is defined experimentally from clinical studies 
such as those described in the Example below. The expression threshold can be selected 
either for maximum sensitivity, or for maximum selectivity, or for minimum error. The 
determination of the expression threshold for any situation is well within the knowledge 
of those skilled in the art. 

B. Detailed Description 

[0054] The practice of the present invention will employ, unless otherwise 
indicated, conventional techniques of molecular biology (including recombinant 
techniques), microbiology, cell biology, and biochemistry, which are within the skill of 
the art. Such techniques are explained fully in the literature, such as, "Molecular 
Cloning: A Laboratory Manual", 2 nd edition (Sambrook et al., 1989); "Oligonucleotide 
Synthesis" (M.J. Gait, ed., 1984); "Animal Cell Culture" (R.I. Freshney, ed., 1987); 
"Methods in Enzymology" (Academic Press, Inc.); "Handbook of Experimental 
Immunology", 4 th edition (D.M. Weir & C.C. Blackwell, eds., Blackwell Science Inc., 
1987); "Gene Transfer Vectors for Mammalian Cells" (J.M. Miller & M.P. Calos, eds., 
1987); "Current Protocols in Molecular Biology" (F.M. Ausubel et al., eds., 1987); and 
"PCR: The Polymerase Chain Reaction", (Mullis et al., eds., 1994). 

1. Gene Expression Profiling 

[0055] In general, methods of gene expression profiling can be divided into 
two large groups: methods based on hybridization analysis of polynucleotides, and 
methods based on sequencing of polynucleotides. The most commonly used methods 
known in the art for the quantification of mRNA expression in a sample include northern 
blotting and in situ hybridization (Parker & Barnes, Methods in Molecular Biology 
106:247-283 (1999)); RNAse protection assays (Hod, Biotechniques 13:852-854 (1992)); 
and reverse transcription polymerase chain reaction (RT-PCR) (Weis et al, Trends in 
Genetics 8:263-264 (1992)). Alternatively, antibodies may be employed that can 
recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA-RNA 
hybrid duplexes or DNA-protein duplexes. Representative methods for sequencing-based 
gene expression analysis include Serial Analysis of Gene Expression (SAGE), and gene' 
expression analysis by massively parallel signature sequencing (MPSS). 
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2. Reverse Transcriptase PCR (RT-PCR) 

[0056] Of the techniques listed above, the most sensitive and most flexible 
quantitative method is RT-PCR, which can be used to compare mRNA levels in different 
sample populations, in normal and tumor tissues, with or without drug treatment, to 
characterize patterns of gene expression, to discriminate between closely related mRNAs, 
and to analyze RNA structure. . 

[0057] The first step is the isolation of mRNA from a target sample. The 
starting material is typically total RNA isolated from human tumors or tumor cell lines, 
and corresponding normal tissues or cell lines, respectively. Thus RNA can be isolated 
from a variety of primary tumors, including breast, lung, colon, prostate, brain, liver, 
kidney, pancreas, spleen, thymus, testis, ovary, uterus, etc., tumor, or tumor cell lines, 
with pooled DNA from healthy donors. If the source of mRNA is a primary tumor, 
mRNA can be extracted, for example, from frozen or archived paraffin-embedded and 
fixed (e.g. formalin-fixed) tissue samples. 

[0058] General methods for mRNA extraction are well known in the art and 
are disclosed in standard textbooks of molecular biology, including Ausubel et al. 9 
Current Protocols of Molecular Biology, John Wiley and Sons (1997). Methods for RNA 
extraction from paraffin embedded tissues are disclosed, for example, in Rupp and 
Locker, Lab Invest. 56:A67 (1987), and De Andres et al, BioTechniques 18:42044 
(1995). In particular, RNA isolation can be performed using purification kit, buffer set 
and protease from commercial manufacturers, such as Qiagen, according to the 
manufacturer's instructions. For example, total RNA from cells in culture can be isolated 
using Qiagen RNeasy mini-columns. Other commercially available RNA isolation kits 
include MasterPure™ Complete DNA and RNA Purification Kit (EPICENTRE®, 
Madison, WI), and Paraffin Block RNA Isolation Kit (Ambion, Inc.). Total RNA from 
tissue samples can be isolated using RNA Stat-60 (Tel-Test). RNA prepared from tumor 
can be isolated, for example, by cesium chloride density gradient centrifiigation. 

[0059] As RNA cannot serve as a template for PCR, the first step in gene 
expression profiling by RT-PCR is the reverse transcription of the RNA template into 
cDNA, followed by its exponential amplification in a PCR reaction. The two most 
commonly used reverse transcriptases are avilo myeloblastosis virus reverse transcriptase 
(AMV-RT) and Moloney murine leukemia virus reverse transcriptase (MMLV-RT). The 
reverse transcription step is typically primed using specific primers, random hexamers, or 
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oligo-dT primers, depending on the circumstances and the goal of expression profiling. 
For example, extracted RNA can be reverse-transcribed using a GeneAmp RNA PCR kit 
(Perkin Elmer, CA, USA), following the manufacturer's instructions. The derived cDNA 
can then be used as a template in the subsequent PCR reaction. 

[0060] Although the PCR step can use a variety of thermostable DNA- 
dependent DNA polymerases, it typically employs the Taq DNA polymerase, which has a 
5' -3' nuclease activity but lacks a 3' -5' proofreading endonuclease activity. Thus, 
TaqMan® PCR typically utilizes the 5 '-nuclease activity of Taq or Tth polymerase to 
hydrolyze a hybridization probe bound to its target amplicon, but any enzyme with 
equivalent 5 5 nuclease activity can be used. Two oligonucleotide primers are used to 
generate an amplicon typical of a PCR reaction. A third oligonucleotide, or probe, is 
designed to detect nucleotide sequence located between the two PCR primers. The probe 
is non-extendible by Taq DNA polymerase enzyme, and is labeled with a reporter 
fluorescent dye and a quencher fluorescent dye. Any laser-induced emission from the 
reporter dye is quenched by the quenching dye when the two dyes are located close 
together as they are on the probe. During the amplification reaction, the Taq DNA 
polymerase enzyme cleaves the probe in a template-dependent manner. The resultant 
probe fragments disassociate in solution, and signal from the released reporter dye is free 
from the quenching effect of the second fluorophore. One molecule of reporter dye is 
liberated for each new molecule synthesized, and detection of the unquenched reporter 
dye provides the basis for quantitative interpretation of the data. 

[0061] TaqMan® RT-PCR can be performed using commercially available 
equipment, such as, for example, ABI PRISM 7700™ Sequence Detection System™ 
(Perkin-Elmer-Applied Biosystems, Foster City, CA, USA), or Lightcycler (Roche 
Molecular Biochemicals, Mannheim, Germany). In a preferred embodiment, the 5* 
nuclease procedure is run on a real-time quantitative PCR device such as the ABI PRISM 
7700™ Sequence Detection System™. The system consists of a thermocycler, laser, 
charge-coupled device (CCD), camera and computer. The system amplifies samples in a 
96-well format on a thermocycler. During amplification, laser-induced fluorescent signal 
is collected in real-time through fiber optics cables for all 96 wells, and detected at the 
CCD. The system includes software for running the instrument and for analyzing the 
data. 

[0062] 5'-Nuclease assay data are initially expressed as Ct, or the threshold 
cycle. As discussed above, fluorescence values are recorded during every cycle and 
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represent the amount of product amplified to that point in the amplification reaction. The 
point when the fluorescent signal is first recorded as statistically significant is the 
threshold cycle (C t ). 

[00631 To minimize errors and the effect of sample-to-sample variation, RT- 
PCR is usually performed using an internal standard. The ideal internal standard is 
expressed at a constant level among different tissues, and is unaffected by the 
experimental treatment. RNAs most frequently used to normalize patterns of gene 
expression are mRNAs for the housekeeping genes glyceraldehyde-3-phosphate- 
dehydrogenase (GAPDH) and (3-actin. 

[0064] A more recent variation of the RT-PCR technique is the real time 
quantitative PCR, which measures PCR product accumulation through a dual-labeled 
fluorigenic probe (i.e., TaqMan® probe). Real time PCR is compatible both with 
quantitative competitive PCR, where internal competitor for each target sequence is used 
for normalization, and with quantitative comparative PCR using a normalization gene 
contained within the sample, or a housekeeping gene for RT-PCR. For further details see, 
e.g. Held et al, Genome Research 6:986-994 (1996). 

[0065] The steps of a representative protocol for profiling gene expression 
using fixed, paraffin-embedded tissues as the RNA source, including mRNA isolation, 
purification, primer extension and amplification are given in various published journal 
articles {for example: T.E. Godfrey et al,. J. Molec. Diagnostics 2: 84-91 [2000]; K. 
Specht et al., Am. J. Pathol. 158: 419-29 [2001]}. Briefly, a representative process starts 
with cutting about 10 \xm thick sections of paraffin-embedded tumor tissue samples. The 
RNA is then extracted, and protein and DNA are removed. After analysis of the RNA 
concentration, RNA repair and/or amplification steps may be included, if necessary, and 
RNA is reverse transcribed using gene specific promoters followed by RT-PCR. 

[0066] According to one aspect of the present invention, PCR primers and 
probes are designed based upon intron sequences present in the gene to be amplified. In 
this embodiment, the first step in the primer/probe design is the delineation of intron 
sequences within the genes. This can be done by publicly available software, such as the 
DNA BLAT software developed by Kent, W.J., Genome Res. 12(4):656-64 (2002), or by 
the BLAST software including its variations. Subsequent steps follow well established 
methods of PCR primer find probe design. 
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[0067] In order to avoid non-specific signals, it is important to mask repetitive 
sequences within the introns when designing the primers and probes. This can be easily 
accomplished by using the Repeat Masker program available on-line through the Baylor 
College of Medicine, which screens DNA sequences against a library of repetitive 
elements and returns a query sequence in which the repetitive elements are masked. The 
masked intron sequences can then be used to design primer and probe sequences using 
any commercially or otherwise publicly available primer/probe design packages, such as 
Primer Express (Applied Biosystems); MGB assay-by-design (Applied Biosystems); 
Primer3 (Steve Rozen and Helen J. Skaletsky (2000) Primer3 on the WWW for general 
users and for biologist programmers. In: Krawetz S, Misener S (eds) Bioinformatics 
Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp 
365-386) 

[0068] The most important factors considered in PCR primer design include 
primer length, melting temperature (Tm), and G/C content, specificity, complementary 
primer sequences, and 3 '-end sequence. In general, optimal PCR primers are generally 
17-30 bases in length, and contain about 20-80%, such as, for example, about 50-60% 
G+C bases. Tm's between 50 and 80 °C, e.g. about 50 to 70 °C are typically preferred. 

[0069] For further guidelines for PCR primer and probe design see, e.g. 
Dieffenbach, C.W. et aL, "General Concepts for PCR Primer Design" in: PCR Primer, A 
Laboratory Manual, Cold Spring Harbor Laboratory Press, New York, 1995, pp. 133- 
155; Innis and Gelfand, "Optimization of PCRs" in: PCR Protocols, A Guide to Methods 
and Applications, CRC Press, London, 1994, pp. 5-11; and Plasterer, T.N. Primerselect: 
Primer and probe design. Methods Mol. Biol. 70:520-527 (1997), the entire disclosures of 
which are hereby expressly incorporated by reference. 

3. Microarravs 

[0070] Differential gene expression can also be identified, or confirmed using 
the microarray technique. Thus, the expression profile of breast cancer-associated genes 
can be measured in either fresh or paraffin-embedded tumor tissue, using microarray 
technology. In this method, polynucleotide sequences of interest (including cDNAs and 
oligonucleotides) are plated, or arrayed, on a microchip substrate. The arrayed sequences 
are then hybridized with specific DNA probes from cells or tissues of interest. Just as in 
the RT-PCR method, the source of mRNA typically is total RNA isolated from human 
tumors or tumor cell lines, and corresponding normal tissues or cell lines. Thus RNA can 
be isolated from a variety of primary tumors or tumor cell lines. If the source of mRNA 
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is a primary tumor, mRNA can be extracted, for example, from frozen or archived 
paraffin-embedded and fixed (e.g. formalin-fixed) tissue samples, which are routinely 
prepared and preserved in everyday clinical practice. 

[0071] In a specific embodiment of the microarray technique, PCR amplified 
inserts of cDNA clones are applied to a substrate in a dense array. Preferably at least 
10,000 nucleotide sequences are applied to the substrate. The microarrayed genes, 
immobilized on the microchip at 10,000 elements each, are suitable for hybridization 
under stringent conditions. Fluorescently labeled cDNA probes may be generated 
through incorporation of fluorescent nucleotides by reverse transcription of RNA 
extracted from tissues of interest. Labeled cDNA probes applied to the chip hybridize 
with specificity to each spot of DNA on the array. After stringent washing to remove 
non-specifically bound probes, the chip is scanned by confocal laser microscopy or by 
another detection method, such as a CCD camera. Quantitation of hybridization of each 
arrayed element allows for assessment of corresponding mRNA abundance. With dual 
color fluorescence, separately labeled cDNA probes generated from two sources of RNA 
are hybridized pairwise to the array. The relative abundance of the transcripts from the 
two sources corresponding to each specified gene is thus determined simultaneously. The 
miniaturized scale of the hybridization affords a convenient and rapid evaluation of the 
expression pattern for large numbers of genes. Such methods have been shown to have 
the sensitivity required to detect rare transcripts, which are expressed at a few copies per 
cell, and to reproducibly detect at least approximately two-fold differences in the 
expression levels (Schena et al t Proc. Natl Acad. Sci. USA 93(2):106-149 (1996)). 
Microarray analysis can be performed by commercially available equipment, following 
manufacturer's protocols, such as by using the Affymetrix GenChip technology, or 
Incyte's microarray technology. 

[0072] The development of microarray methods for large-scale analysis of 
gene expression makes it possible to search systematically for molecular markers of 
cancer classification and outcome prediction in a variety of tumor types. 

4. Serial Analysis of Gene Expression (SAGE) 

[0073] Serial analysis of gene expression (SAGE) is a method that allows the 
simultaneous and quantitative analysis of a large number of gene transcripts, without the 
need of providing an individual hybridization probe for each transcript. First, a short 
sequence tag (about 10-14 bp) is generated that contains sufficient information to 
uniquely identify a transcript, provided that the tag is obtained from a unique position 
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within each transcript. Then, many transcripts are linked together to form long serial 
molecules, that can be sequenced, revealing the identity of the multiple tags 
simultaneously. The expression pattern of any population of transcripts can be 
quantitatively evaluated by determining the abundance of individual tags, and identifying 
the gene corresponding to each tag. For more details see, e.g. Velculescu et al f Science 
270:484-487 (1995); and Velculescu etal. t Cell 88:243-51 (1997). 

5. Mass ARRAY Technology 

[0074] The MassARRAY (Sequenom, San Diego, California) technology is an 
automated, high-throughput method of gene expression analysis using mass spectrometry 
(MS) for detection. According to this method, following the isolation of RNA, reverse 
transcription and PCR amplification, the cDNAs are subjected to primer extension. The 
cDNA-derived primer extension products are purified, and dipensed on a chip array that 
is pre-loaded with the components needed for MALTI-TOF MS sample preparation. The 
various cDNAs present in the reaction are quantitated by analyzing the peak areas in the 
mass spectrum obtained. 

6. Gene Expression Analysis by Massively Parallel Signature Sequencing 
(MPSS) 

[0075] This method, described by Brenner et al, Nature Biotechnology 
18:630-634 (2000), is a sequencing approach that combines non-gel-based signature 
sequencing with in vitro cloning of millions of templates on separate 5 jam diameter 
microbeads. First, a microbead library of DNA templates is constructed by in vitro 
cloning. This is followed by the assembly of a planar array of the template-containing 
microbeads in a flow cell at a high density (typically greater than 3 x 10 6 
microbeads/cm 2 ). The free ends of the cloned templates on each microbead are analyzed 
simultaneously, using a fluorescence-based signature sequencing method that does not 
require DNA fragment separation. This method has been shown to simultaneously and 
accurately provide, in a single operation, hundreds of thousands of gene signature 
sequences from a yeast cDNA library. 

7. Immunohistochemistry 

[0076] Immunohistochemistry methods are also suitable for detecting the 
expression levels of the prognostic markers of the present invention. Thus, antibodies or 
antisera, preferably polyclonal antisera, and most preferably monoclonal antibodies 
specific for each marker are used to detect expression. The antibodies can be detected by 
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direct labeling of the antibodies themselves, for example, with radioactive labels, 
fluorescent labels, hapten labels such as, biotin, or an enzyme such as horse radish 
peroxidase or alkaline phosphatase. Alternatively, unlabeled primary antibody is used in 
conjunction with a labeled secondary antibody, comprising antisera, polyclonal antisera 
or a monoclonal antibody specific for the primary antibody. Immunohistochemistry 
protocols and kits are well known in the art and are commercially available. 

8. Proteomics 

[0077] The term "proteome" is defined as the totality of the proteins present in 
a sample (e.g. tissue, organism, or cell culture) at a certain point of time. Proteomics 
includes, among other things, study of the global changes of protein expression in a 
sample (also referred to as "expression proteomics")- Proteomics typically includes the 
following steps: (1) separation of individual proteins in a sample by 2-D gel 
electrophoresis (2-D PAGE); (2) identification of the individual proteins recovered from 
the gel, e.g. my mass spectrometry or N-terminal sequencing, and (3) analysis of the data 
using bioinformatics. Proteomics methods are valuable supplements to other methods of 
gene expression profiling, and can be used, alone or in combination with other methods, 
to detect the products of the prognostic markers of the present invention. 

9. General Description of the mRNA Isolation, Purification and 
Amplification 

[0078] The steps of a representative protocol for profiling gene expression 
using fixed, paraffin-embedded tissues as the RNA source, including mRNA isolation, 
purification, primer extension and amplification are given in various published journal 
articles {for example: T.E. Godfrey et al. J. Molec. Diagnostics 2: 84-91 [2000]; K. 
specht et al., Am. J. Pathol. 158: 419-29 [2001]}. Briefly, a representative process starts 
with cutting about 10 |im thick sections of paraffin-embedded tumor tissue samples. The 
RNA is then extracted, and protein and DNA are removed. After analysis of the RNA 
concentration, RNA repair and/or amplification steps may be included, if necessary, and 
RNA is reverse transcribed using gene specific promoters followed by RT-PCR. Finally, 
the data are analyzed to identify the best treatment option(s) available to the patient on the 
basis of the characteristic gene expression pattern identified in the tumor sample 
examined. 
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10. Breast Cancer Gene Set. Assayed Gene Subsequences, and Clinical 
Application o f Gene Expression Data 

[0079] An important aspect of the present invention is to use the measured 
expression of certain genes by breast cancer tissue to provide prognostic information. For 
this purpose it is necessary to correct for (normalize away) both differences in the amount 
of RNA assayed and variability in the quality of the RNA used. Therefore, the assay 
typically measures and incorporates the expression of certain normalizing genes, 
including well known housekeeping genes, such as GAPDH and Cypl. Alternatively, 
normalization can be based on the mean or median signal (Ct) of all of the assayed genes 
or a large subset thereof (global normalization approach). On a gene-by-gene basis, 
measured normalized amount of a patient tumor mRNA is compared to the amount found 
in a breast cancer tissue reference set. The number (N) of breast cancer tissues in this 
reference set should be sufficiently high to ensure that different reference sets (as a 
whole) behave essentially the same way. If this condition is met, the identity of the 
individual breast cancer tissues present in a particular set will have no significant impact 
on the relative amounts of the genes assayed. Usually, the breast cancer tissue reference 
set consists of at least about 30, preferably at least about 40 different FPE breast cancer 
tissue specimens. Unless noted otherwise, normalized expression levels for each 
mRNA/tested tumor/patient will be expressed as a percentage of the expression level 
measured in the reference set. More specifically, the reference set of a sufficiently high 
number (e.g. 40) of tumors yields a distribution of normalized levels of each mRNA 
species. The level measured in a particular tumor sample to be analyzed falls at some 
percentile within this range, which can be determined by methods well known in the art. 
Below, unless noted otherwise, reference to expression levels of a gene assume 
normalized expression relative to the reference set although this is not always explicitly 
stated. 

[0080] Further details of the invention will be described in the following non- 
limiting Example 

Example 

A Phase II Study of Gene Expression in 79 Malignant Breast Tumors 
[0081] A gene expression study was designed and conducted with the primary 
goal to molecularly characterize gene expression in paraffin-embedded, fixed tissue 
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samples of invasive breast ductal carcinoma, and to explore the correlation between such 
molecular profiles and disease-free survival. 

Study design 

[0082] Molecular assays were performed on paraffin-embedded, formalin- 
fixed primary breast tumor tissues obtained from 79 individual patients diagnosed with 
invasive breast cancer. All patients in the study had 10 or more positive nodes. Mean 
age was 57 years, and mean clinical tumor size was 4.4 cm. Patients were included in the 
study only if histopathologic assessment, performed as described in the Materials and 
Methods section, indicated adequate amounts of tumor tissue and homogeneous 
pathology. 

Materials and Methods 

[0083] Each representative tumor block was characterized by standard 
histopathology for diagnosis, semi-quantitative assessment of amount of tumor, and 
tumor grade. A total of 6 sections (10 microns in thickness each) were prepared and 
placed in two Costar Brand Microcentrifuge Tubes (Polypropylene, 1.7 mL tubes, clear; 3 
sections in each tube). If the tumor constituted less than 30% of the total specimen area, 
the sample may have been crudely dissected by the pathologist, using gross 
microdissection, putting the tumor tissue directly into the Costar tube. 

[0084] If more than one tumor block was obtained as part of the surgical 
procedure, the block most representative of the pathology was used for analysis. 

Gene Expression Analysis 

[0085] mRNA was extracted and purified from fixed, paraffin-embedded 
tissue samples, and prepared for gene expression analysis as described in section 9 above. 

[0086] Molecular assays of quantitative gene expression were performed by 
RT-PCR, using the ABI PRISM 7900™ Sequence Detection System™ (Perkin-Elmer- 
Applied Biosystems, Foster City, CA, USA). ABI PRISM 7900™ consists of a 
thermocycler, laser, charge-coupled device (CCD), camera and computer. The system 
amplifies samples in a 384-well format on a thermocycler. During amplification, 
laser-induced fluorescent signal is collected in real-time through fiber optics cables for all 
384 wells, and detected at the CCD. The system includes software for running the 
instrument and for analyzing the data. 
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Analysis and Results 

[0087] Tumor tissue was analyzed for 185 cancer-related genes and 7 
reference genes. The threshold cycle (CT) values for each patient were normalized based 
on the median of the 7 reference genes for that particular patient. Clinical outcome data 
were available for all patients from a review of registry data and selected patient charts. 

[0088] Outcomes were classified as: 

0 died due to breast cancer or to unknown cause or alive with breast cancer 
recurrence; 

1 alive without breast cancer recurrence or died due to a cause other than 
breast cancer 

[0089] Analysis was performed by: 

1. Analysis of the relationship between normalized gene expression and the 
binary outcomes of 0 or 1 . 

2. Analysis of the relationship between normalized gene expression and the 
time to outcome (0 or 1 as defined above) where patients who were alive without breast 
cancer recurrence or who died due to a cause other than breast cancer were censored. 
This approach was used to evaluate the prognostic impact of individual genes and also 
sets of multiple genes. 

Analysis of patients with invasive breast carcinoma by binary approach 
[0090] In the first (binary) approach, analysis was performed on all 79 patients 
with invasive breast carcinoma. A t test was performed on the groups of patients 
classified as either no recurrence and no breast cancer related death at three years, versus 
recurrence, or breast cancer-related death at three years, and the p-values for the 
differences between the groups for each gene were calculated. 

[0091] Table 1 lists the 47 genes for which the p-value for the differences 
between the groups was O.10. The first column of mean expression values pertains to 
patients who neither had a metastatic recurrence of nor died from breast cancer. The 
second column of mean expression values pertains to patients who either had a metastatic 
recurrence of or died from breast cancer. 

Table 1 

Mean Mean t-value df p Valid N Valid N 

Bcl2 -0.15748 -1.22816 4.00034 75 0.000147 35 42 

PR -2.67225 -5.49747 3.61540 75 0.000541 35 42 

IGF1R -0.59390 -1.71506 3.49158 75 0.000808 35 42 

BAG1 0.18844 -0.68509 3.42973 75 0.000985 35 42 

CD68 -0.52275 0.10983 -3.41186 75 0.001043 35 42 
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EstR1 


-0.35581 


-3.00699 


3.32190 


75 


0.001384 


35 


42 


CTSL 


-0.64894 


-0.09204 


-3.26781 


75 


0.001637 


35 


42 


IGFBP2 


-0.81181 


-1 .78398 


3.24158 


75 


0.001774 


35 


42 


GATA3 


1 .80525 


0.57428 


3.15608 


75 


0.002303 


35 


42 


TP53BP2 


-4.71118 


-6.09289 


3.02888 


75 


0.003365 


35 


42 


EstR1 


3.67801 


1 .64693 


3.01073 


75 


0.003550 


35 


42 


CEGP1 


-2.02566 


-4.25537 


2.85620 


75 


0.005544 


35 


42 


SURV 


-3.67493 


-2.96982 


-2.70544 


75 


0.008439 


35 


42 


p27 


0.80789 


0.28807 


2.55401 


75 


0.012678 


35 


42 


Chk1 


-3.37981 


-2.80389 


-2.46979 


75 


0.015793 


35 


42 


BBC3 


-4.71789 


-5.62957 


2.46019 


75 


0.016189 


35 


42 


ZNF217 


1.10038 


0.62730 


2.42282 


75 


0.017814 


35 


42 


EGFR 


-2.88172 


-2.20556 


-2.34774 


75 


0.021527 


35 


42 


CD9 


1 .29955 


0.91025 


2.31439 


75 


0.023386 


35 


42 


MYBL2 


-3.77489 


-3.02193 


-2.29042 


75 


0.024809 


35 


42 


HIF1A 


-0.44248 


0.03740 


-2.25950 


75 


0.026757 


35 


42 


GRB7 


-1.96063 


-1 .05007 


-2.25801 


75 


0.026854 


35 


42 


pS2 


-1.00691 


-3.13749 


2.24070 


75 


0.028006 


35 


42 


RIZ1 


-7.62149 


-8.38750 


2.20226 


75 


0.030720 


35 


42 


ErbB3 


-6.89508 


-7.44326 


2.16127 


75 


0.033866 


35 


42 


TOP2B 


0.45122 


0.12665 


2.14616 


75 


0.035095 


35 


42 


MDM2 


1 .09049 


0.69001 


2.10967 


75 


0.038223 


35 


42 


PRAME 


-6.40074 


-7.70424 


2.08126 


75 


0.040823 


35 


42 


GUS 


-1.51683 


-1 .89280 


2.05200 


75 


0.043661 


35 


42 


RAD51C 


-5.85618 


-6.71334 


2.04575 


75 


0.044288 


35 


42 


AIB1 


-3.08217 


-2.28784 


-2.00600 


75 


0.048462 


35 


42 


OTI/'l C 

o 1 Klo 


-O.I \o\Jf 






fO 


n t\A Q7CQ 

U.U4of Do 






GAPDH 


-0.35829 


-0.02292 


-1 .94326 


75 


0.055737 


35 


42 


FHIT 


-3.00431 


-3.67175 


1 .86927 


75 


0.065489 


35 


42 


KRT19 


2.52397 


2.01694 


1.85741 


75 


0.067179 


35 


42 


TS 


-2.83607 


-2.29048 


-1.83712 


75 


0.070153 


35 


42 


GSTM1 


-3.69140 


-4.38623 


1 .83397 


75 


0.070625 


35 


42 


G- 

Catenin 


0.31875 


-0.15524 


1 .80823 


75 


0 074580 


35 


42 


AKT2 


0.78858 


0.46703 


1 .79276 


75 


0.077043 


35 


42 


CCNB1 


-4.26197 


-3.51628 


-1 .78803 


75 


0.077810 


35 


42 


PI3KC2A 


-2.27401 


-2.70265 


1 .76748 


75 


0.081215 


35 


42 


FBXOS 


-4.72107 


-4.24411 


-1.75935 


75 


0.082596 


35 


42 


DR5 


-5.80850 


-6.55501 


1.74345 


75 


0.085353 


35 


42 


CIAP1 


-2.81825 


-3.09921 


1.72480 


75 


0.088683 


35 


42 


MCM2 


-2.87541 


-2.50683 


-1.72061 


75 


0.089445 


35 


42 


CCND1 


1.30995 


0.80905 


1.68794 


75 


0.095578 


35 


42 


EIF4E 


-5.37657 


-6.47156 


1.68169 


75 


0.096788 


35 


42 



[0092] In the foregoing Table 1 , negative t- values indicate higher expression, 
associated with worse outcomes, and, inversely, higher (positive) t-values indicate higher 
expression associated with better outcomes. Thus, for example, elevated expression of the 
CD68 gene (t-value = -3.41, CT mean alive< CT mean deceased) indicates a reduced 
likelihood of disease free survival. Similarly, elevated expression of the BC12 gene (t- 
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value = 4.00; CT mean alive> CT mean deceased) indicates an increased likelihood of 
disease free survival. 

[0093] Based on the data set forth in Table 1, the expression of any of the 
following genes in breast cancer above a defined expression threshold indicates a reduced 
likelihood of survival without cancer recurrence following surgery: Grb7, CD68, CTSL, 
Chkl, Her2, STK15, AIB1, SURV, EGFR, MYBL2, HIFla. 

[0094] Based on the data set forth in Table 1, the expression of any of the 
following genes in breast cancer above a defined expression threshold indicates a better 
prognosis for survival without cancer recurrence following surgery: TP53BP2, PR, Bcl2, 
KRT14, EstRl, IGFBP2, BAG1, CEGP1, KLK10, p Catenin, GSTM1, FHIT, Rizl, 
IGF1, BBC3, IGFR1, TBP, p27, IRS1, IGF1R, GAT A3, CEGP1, ZNF217, CD9, pS2, 
ErbB3, TOP2B, MDM2, RAD51, and KRT19. 

Analysis of ER positive patients by binary approach 

[0095] 57 patients with normalized CT for estrogen receptor (ER) >0 (i.e., ER 
positive patients) were subjected to separate analysis. A t test was performed on the two 
groups of patients classified as either no recurrence and no breast cancer related death at 
three years, or recurrence or breast cancer-related death at three years, and the p-values 
for the differences between the groups for each gene were calculated. Table 2, below, 
lists the genes where the p-value for the differences between the groups was <0.105. The 
first column of mean expression values pertains to patients who neither had a metastatic 
recurrence nor died from breast cancer. The second column of mean expression values 
pertains to patients who either had a metastatic recurrence of or died from breast cancer. 

Table 2 





Mean 


Mean 


t-value 


df 


P 


Valid N 


Valid N 


IGF1R 


-0.13975 


-1 .00435 


3.65063 


55 


0.000584 


30 


27 


Bcl2 


0.15345 


-0.70480 


3.55488 


55 


0.000786 


30 


27 


CD68 


-0.54779 


0.19427 


-3.41818 


55 


0.001193 


30 


27 


HNF3A 


0.39617 


-0.63802 


3.20750 


55 


0.002233 


30 


27 


CTSL 


-0.66726 


0.00354 


-3.20692 


55 


0.002237 


30 


27 


TP53BP2 


-4.81858 


-6.44425 


3.13698 


55 


0.002741 


30 


27 


GATA3 


2.33386 


1.40803 


3.02958 


55 


0.003727 


30 


27 


BBC3 


-4.54979 


-5.72333 


2.91943 


55 


0.005074 


30 


27 


RAD51C 


-5.63363 


-6.94841 


2.85475 


55 


0.006063 


30 


27 


BAG1 


0.31087 


-0.50669 


2.61524 


55 


0.011485 


30 


27 


IGFBP2 


-0.49300 


-1.30983 


2.59121 


55 


0.012222 


30 


27 


FBX05 


-4.86333 


-4.05564 


-2.56325 


55 


0.013135 


30 


27 


EstRl 


0.68368 


-0.66555 


2.56090 


55 


0.013214 


30 


27 


PR 


-1.89094 


-3.86602 


2.52803 


55 


0.014372 


30 


27 


SURV 


-3.87857 


-3.10970 


-2.49622 


55 


0.015579 


30 


27 


CD9 


1.41691 


0.91725 


2.43043 


55 


0.018370 


30 


27 
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RB1 


-2.51662 


-2.97419 


2.41221 


55 


0.019219 


30 


27 


EPHX1 


-3.91703 


-5.85097 


2.29491 


55 


0.025578 


30 


27 


CEGP1 


-1.18600 


-2.95139 


2.26608 


55 


0.027403 


30 


27 


CCNB1 


-4.44522 


-3.35763 


-2.25148 


55 


0.028370 


30 


27 


TRAIL 


0.34893 


-0.56574 


2.20372 


55 


0.031749 


30 


27 


EstR1 


4.60346 


3.60340 


2.20223 


55 


0.031860 


30 


27 


DR5 


-5.71827 


-6.79088 


2.14548 


55 


0.036345 


30 


27 


MCM2 


-2.96800 


-2.48458 


-2.10518 


55 


0.039857 


30 


27 


Chk1 


-3.46968 


-2.85708 


-2.08597 


55 


0.041633 


30 


27 


p27 


0.94714 


0.49656 


2.04313 


55 


0.045843 


30 


27 


MYBL2 


-3.97810 


-3.14837 


-2.02921 


55 


0.047288 


30 


27 


GUS 


-1.42486 


-1.82900 


1 .99758 


55 


0.050718 


30 


27 


P53 


-1.08810 


-1.47193 


1 .92087 


55 


0.059938 


30 


27 


HIF1A 


-0.40925 


0.11688 


-1.91278 


55 


0.060989 


30 


27 


cMet 


-6.36835 


-5.58479 


-1.88318 


55 


0.064969 


30 


27 


EGFR 


-2.95785 


-2.28105 


-1.86840 


55 


0.067036 


30 


27 


MTA1 


-7.55365 


-8.13656 


1.81479 


55 


0.075011 


30 


27 


RIZ1 


-7.52785 


-8.25903 


1.79518 


55 


0.078119 


30 


27 


ErbB3 


-6.62488 


-7.10826 


1.79255 


55 


0.078545 


30 


27 


TOP2B 


0.54974 


0.27531 


1 .74888 


55 


0.085891 


30 


27 


EIF4E 


-5.06603 


-6.31426 


1.68030 


55 


0.098571 


30 


27 


TS 


-2.95042 


-2.36167 


-1.67324 


55 


0.099959 


30 


27 


STK15 


-3.25010 


-2.72118 


-1 .64822 


55 


0.105010 


30 


27 



[0096] For each gene, a classification algorithm was utilized to identify the 
best threshold value (CT) for using each gene alone in predicting clinical outcome. 

[0097] Based on the data set forth in Table 2, expression of the following 
genes in ER-positive cancer above a defined expression level is indicative of a reduced 
likelihood of survival without cancer recurrence following surgery: CD68; CTSL; 
FBXOS; SURV; CCNB1; MCM2; Chkl; MYBL2; HIF1A; cMET; EGFR; TS; STK15. 
Many of these genes (CD68, CTSL, SURV, CCNB1, MCM2, Chkl, MYBL2, EGFR, and 
STK15) were also identified as indicators of poor prognosis in the previous analysis, not 
limited to ER-positive breast cancer. Based on the data set forth in Table 2, expression of 
the following genes in ER-positive cancer above a defined expression level is indicative 
of a better prognosis for survival without cancer recurrence following surgery: IGFR1; 
BC12; HNF3A; TP53BP2; GAT A3; BBC3; RAD51C; BAG1; IGFBP2; PR; CD9; RBI; 
EPHX1; CEGP1; TRAIL; DR5; p27; p53; MTA; RIZ1; ErbB3; TOP2B; EEF4E. Of the 
latter genes, IGFR1; BC12; TP53BP2; GAT A3; BBC3; RAD51C; BAG1; IGFBP2; PR; 
CD9; CEGP1; DR5; p27; RIZ1; ErbB3; TOP2B; EIF4E have also been identified as 
indicators of good prognosis in the previous analysis, not limited to ER-positive breast 
cancer. 
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Analysis of ER negative patients by binary approach 

[0098] Twenty patients with normalized CT for estrogen receptor (ER) <1.6 
(i.e., ER negative patients) were subjected to separate analysis. A t test was performed on 
the two groups of patients classified as either no recurrence and no breast cancer related 
death at three years, or recurrence or breast cancer-related death at three years, and the p- 
values for the differences between the groups for each gene were calculated. Table 3 lists 
the genes where the p-value for the differences between the groups was <0.1 18. The first 
column of mean expression values pertains to patients who neither had a metastatic 
recurrence nor died from breast cancer. The second column of mean expression values 
pertains to patients who either had a metastatic recurrence of or died from breast cancer. 

Table 3 



KRT14 

KLK10 

CCND1 

Upa 

HNF3A 

Maspin 

CDH1 

HER2 

GRB7 

AKT1 

TGFA 

FRP1 

STMY3 

Contig 

27882 

A- 

Catenin 

VDR 

GR01 

MCM3 

B-actin 

HIF1A 

MMP9 

VEGF 

PRAME 

AIB1 

KRT5 

KRT18 

KRT17 

P14ARF 



Mean 
-1.95323 
-2.68043 
-1.02285 
-0.91272 
-6.04780 
-3.56145 
-3.54450 
-1.48973 
-2.55289 
-0.36849 
-4.03137 

1.45776 
-1.59610 

-4.27585 

-1.19790 

-4.37823 
-3.65034 
-3.86041 

4.69672 
-0.64183 
-8.90613 

0.37904 
-4.95855 
-3.12245 
-1.32418 

1.08383 
-0.69073 
-1.87104 



Mean 
-6.69231 
-7.11288 
0.03732 
-0.04773 
-2.36469 
-6.18678 
-2.34984 
1.53108 
0.00036 
0.46222 
-5.67225 
-1.39459 
-0.26305 

-7.34338 

-0.39085 

-2.37167 
-5.97002 
-5.55078 

5.19190 
-0.10566 
-7.35163 

1.10778 
-7.41973 
-1.92934 
-3.62027 

2.25369 
-3.56536 
-3.36534 



t-value 
4.03303 
3.10321 
-2.77992 
-2.49460 
-2.43148 
2.40169 
-2.38755 
-2.35826 
-2.32890 
-2.29737 
2.28546 
2.27884 
-2.23191 

2.18700 

-2.15624 

-2.15620 
2.12286 
2.10030 
-2.04951 
-2.02301 
-1 .88747 
-1.87451 
1 .86668 
-1 .86324 
1.85919 
-1.83831 
1 .78449 
1 .63923 



df 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 

18 

18 

18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 
18 



0.000780 
0.006136 
0.012357 
0.022560 
0.025707 
0.027332 
0.028136 
0.029873 
0.031714 
0.033807 
0.034632 
0.035097 
0.038570 

0.042187 

0.044840 

0.044844 
0.047893 
0.050061 
0.055273 
0.058183 
0.075329 
0.077183 
0.078322 
0.078829 
0.079428 
0.082577 
0.091209 
0.118525 



Valid N 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 



5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 



Valid N 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 

15 

15 

15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 
15 



[0099] Based on the data set forth in Table 3, expression of the following 
genes in ER-negative cancer above a defined expression level is indicative of a reduced 
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likelihood of survival without cancer recurrence (p<0.05): CCND1; UPA; HNF3A; 
CDH1; Her2; GRB7; AKT1; STMY3; a-Catenin; VDR; GROl. Only 2 of these genes 
(Her2 and Grb7) were also identified as indicators of poor prognosis in the previous 
analysis, not limited to ER-negative breast cancer. Based on the data set forth in Table 3, 
expression of the following genes in ER-negative cancer above a defined expression level 
is indicative of a better prognosis for survival without cancer recurrence (KT14; KLK10; 
Maspin, TGFa, and FRP1. Of the latter genes, only KLK10 has been identified as an 
indicator of good prognosis in the previous analysis, not limited to ER-negative breast 
cancer. 

Analysis of multiple genes and indicators of outcome 

[0100] Two approaches were taken in order to determine whether using 
multiple genes would provide better discrimination between outcomes. 

[0101] First, a discrimination analysis was performed using a forward 
stepwise approach. Models were generated that classified outcome with greater 
discrimination than was obtained with any single gene alone. 

[0102] According to a second approach (time-to-event approach), for each 
gene a Cox Proportional Hazards model (see, e.g. Cox, D. R., and Oakes, D. (1984), 
Analysis of Survival Data, Chapman and Hall, London, New York) was defined with time 
to recurrence or death as the dependent variable, and the expression level of the gene as 
the independent variable. The genes that have a p-value < 0.10 in the Cox model were 
identified. For each gene, the Cox model provides the relative risk (RR) of recurrence or 
death for a unit change in the expression of the gene. One can choose to partition the 
patients into subgroups at any threshold value of the measured expression (on the CT 
scale), where all patients with expression values above the threshold have higher risk, and 
all patients with expression values below the threshold have lower risk, or vice versa, 
depending on whether the gene is an indicator of bad (RR>1.01) or good (RR<1.01) 
prognosis. Thus, any threshold value will define subgroups of patients with respectively 
increased or decreased risk. The results are summarized in Table 4. The third column, 
with the heading: exp(coef), shows RR values. 
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Table 4 



Gene 


coef 


exp(coef) 


se(coef) 


z 


P 


TP53BP2 


-0.21892 


0.803386 


0.068279 


-3.20625 


0.00134 


GRB7 


0.235697 


1.265791 


0.073541 


3.204992 


0.00135 


PR 


-0.10258 


0.90251 


0.035864 


-2.86018 


0.00423 


CD68 


0.465623 


1.593006 


0.167785 


2.775115 


0.00552 


Bcl2 


-0.26769 


0.765146 


0.100785 


-2.65603 


0.00791 


KRT14 


-0.11892 


0.887877 


0.046938 


-2.53359 


0.0113 


PRAME 


-0.13707 


0.871912 


0.054904 


-2.49649 


0.0125 


CTSL 


0.431499 


1.539564 


0.185237 


2.329444 


0.0198 


EstR1 


-0.07686 


0.926018 


0.034848 


-2.20561 


0.0274 


Chk1 


0.284466 


1.329053 


0.130823 


2.174441 


0.0297 


IGFBP2 


-0.2152 


0.806376 


0.099324 


-2.16669 


0.0303 


HER2 


0.155303 


1.168011 


0.072633 


2.13818 


0.0325 


BAG1 


-0.22695 


0.796959 


0.106377 


-2.13346 


0.0329 


CEGP1 


-0.07879 


0.924236 


0.036959 


-2.13177 


0.033 


STK15 


0.27947 


1.322428 


0.132762 


2.105039 


0.0353 


KLK10 


-0.11028 


0.895588 


0.05245 


-2.10248 


0.0355 


B.Catenin 


-0.16536 


0.847586 


0.084796 


-1.95013 


0.0512 


EstR1 


-0.0803 


0.922842 


0.042212 


-1.90226 


0.0571 


GSTM1 


-0.13209 


0.876266 


0.07221 1 


-1.82915 


0.0674 


TOP2A 


-0.11148 


0.894512 


0.061855 


-1.80222 


0.0715 


AIB1 


0.152968 


1.165288 


0.086332 


1.771861 


0.0764 


FHIT 


-0.15572 


0.855802 


0.088205 


-1 .7654 


0.0775 


RIZ1 


-0.17467 


0.839736 


0.099464 


-1 .75609 


0.0791 


SURV 


0.185784 


1.204162 


0.106625 


1 .742399 


0.0814 


IGF1 


-0.10499 


0.900338 


0.060482 


-1.73581 


0.0826 


BBC3 


-0.1344 


0.874243 


0.077613 


-1.73163 


0.0833 


IGF1R 


-0.13484 


0.873858 


0.077889 


-1.73115 


0.0834 


DIABLO 


0.284336 


1 .32888 


0.166556 


1.707148 


0.0878 


TBP 


-0.34404 


0.7089 


0.20564 


-1.67303 


0.0943 


p27 


-0.26002 


0.771033 


0.1564 


-1.66256 


0.0964 


IRS1 


-0.07585 


0.926957 


0.046096 


-1.64542 


0.0999 



[0103] The binary and time-to-event analyses, with few exceptions, identified 
the same genes as prognostic markers. For example, comparison of Tables 1 and 4 shows 
that 10 genes were represented in the top 15 genes in both lists. Furthermore, when both 
analyses identified the same gene at [p<0.10], which happened for 21 genes, they were 
always concordant with respect to the direction (positive or negative sign) of the 
correlation with survival/recurrence. Overall, these results strengthen the conclusion that 
the identified markers have significant prognostic value. 

[0104] For Cox models comprising more than two genes (multivariate 
models), stepwise entry of each individual gene into the model is performed, where the 
first gene entered is pre-selected from among those genes having significant univariate p- 
values, and the gene selected for entry into the model at each subsequent step is the gene 
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that best improves the fit of the model to the data. This analysis can be performed with 
any total number of genes. In the analysis the results of which are shown below, stepwise 
entry was performed for up to 10 genes. 

[0105] Multivariate analysis is performed using the following equation: 
RR=exp[coef(geneA) x Ct(geneA) + coef(geneB) x Ct(geneB) + coef(geneC) x 

Ct(geneC) + ]. 

[0106] In this equation, coefficients for genes that are predictors of beneficial 
outcome are positive numbers and coefficients for genes that are predictors of 
unfavorable outcome are negative numbers. The "Ct" values in the equation are ACts, i.e. 
reflect the difference between the average normalized Ct value for a population and the 
normalized Ct measured for the patient in question. The convention used in the present 
analysis has been that ACts below and above the population average have positive signs 
and negative signs, respectively (reflecting greater or lesser mRNA abundance). The 
relative risk (RR) calculated by solving this equation will indicate if the patient has an 
enhanced or reduced chance of long-term survival without cancer recurrence. 

Multivariate gene analysis of 79 patients with invasive breast carcinoma 
[0107] A multivariate stepwise analysis, using the Cox Proportional Hazards 
Model, was performed on the gene expression data obtained for all 79 patients with 
invasive breast carcinoma. The following ten-gene sets have been identified by this 
analysis as having particularly strong predictive value of patient survival : 

(a) TP53BP2, Bcl2, BAD, EPHX1, PDGFRp, DIABLO, XIAP, YB1, CA9, and 
KRT8. 

(b) GRB7, CD68, TOP2A, Bcl2, DIABLO, CD3, ID1, PPM1D, MCM6, and WISP1. 

(c) PR, TP53BP2, PRAME, DIABLO, CTSL, IGFBP2, TIMP1, CA9, MMP9, and 
COX2. 

(d) CD68, GRB7, TOP2A, Bcl2, DIABLO, CD3, DDI, PPM1D, MCM6, and WISP1. 

(e) Bcl2, TP53BP2, BAD, EPHX1, PDGFRp, DIABLO, XIAP, YB1, CA9, and 
KRT8. 

(f) KRT14, KRT5, PRAME, TP53BP2, GUS1, AIB1, MCM3, CCNE1, MCM6, and 
DDL 

(g) PRAME, TP53BP2, EstRl, DIABLO, CTSL, PPM1D, GRB7, DAPK1, BBC3, 
and VEGFB. 
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(h) CTSL2, GRB7, TOP2A, CCNB1, Bcl2, DIABLO, PRAME, EMS1, CA9, and 
EpCAM. 

(i) EstRl, TP53BP2, PRAME, DIABLO, CTSL, PPM1D, GRB7, DAPK1, BBC3, 
and VEGFB. 

(k) Chkl, PRAME, p53BP2, GRB7, CA9, CTSL, CCNB1, TOP2A, tumor size, and 
IGFBP2. 

(1) IGFBP2, GRB7, PRAME, DIABLO, CTSL, p-Catenin, PPM1D, Chkl, WISP1, 
and LOT 1. 

(m) HER2, TP53BP2, Bcl2, DIABLO, TIMP1, EPHX1, TOP2A, TRAIL, CA9, and 
AREG. 

(n) BAG1, TP53BP2, PRAME, IL6, CCNB1, PAI1, AREG, tumor size, CA9, and 
Ki67. 

(o) CEGP1, TP53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, STK15, and 
AKT2, and FGF18. 

(p) STK15, TP53BP2, PRAME, IL6, CCNE1, AKT2, DIABLO, cMet, CCNE2, and 
COX2. 

(q) KLK10, EstRl, TP53BP2, PRAME, DIABLO, CTSL, PPM1D, GRB7, DAPK1, 
and BBC3. 

(r) AIB1, TP53BP2, Bcl2, DIABLO, TIMP1, CD3, p53, CA9, GRB7, and EPHX1 
(s) BBC3, GRB7, CD68, PRAME, TOP2A, CCNB1, EPHX1, CTSL 
GSTMl,and APC. 

(t) CD9, GRB7, CD68, TOP2A, Bcl2, CCNB1, CD3, DIABLO, IDl, and PPM1D. 
(w) EGFR, KRT14, GRB7, TOP2A, CCNB1, CTSL, Bcl2, TP, KLK10, and CA9. 
(x) HIFla, PR, DIABLO, PRAME, Chkl, AKT2, GRB7, CCNE1, TOP2A, and 
CCNB1. 

(y) MDM2, TP53BP2, DIABLO, Bcl2, AIB1, TEMPI, CD3, p53, CA9, and HER2. 
(z) MYBL2, TP53BP2, PRAME, IL6, Bcl2, DIABLO, CCNE1, EPHX1, TEVlPl, and 
CA9. 

(aa) p27, TP53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, STK15, AKT2, and 
IDL 

(ab) RAD51, GRB7, CD68, TOP2A, CIAP2, CCNB1, BAG1, IL6, FGFR1, and 
TP53BP2. 

(ac) SURV, GRB7, TOP2A, PRAME, CTSL, GSTM1, CCNB1, VDR, CA9, and 
CCNE2. 
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(ad) TOP2B, TP53BP2, DIABLO, Bcl2, TIMP1, AIB1, CA9, p53, KRT8, and BAD. 

(ae) ZNF217, GRB7, p53BP2, PRAME, DIABLO, Bcl2, COX2, CCNE1, APC4, and 
(3-Catenin. 

[0108] While the present invention has been described with reference to what 
are considered to be the specific embodiments, it is to be understood that the invention is 
not limited to such embodiments. To the contrary, the invention is intended to cover 
various modifications and equivalents included within the spirit and scope of the 
appended claims. For example, while the disclosure focuses on the identification of 
various breast cancer associated genes and gene sets, and on the personalized prognosis of 
breast cancer, similar genes, gene sets and methods concerning other types of cancer are 
specifically within the scope herein. 

[0109] All references cited throughout the disclosure are hereby expressly 
incorporated by reference. 
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Table 5A 



Gene Accession Seq 

AIB1 NM_006534 gcggcgagtttccgatttaaagctgagctgcgaggaaaatggcggcgggaggatcaaaatacttgctggatggtgga 

AKT1 NMJM5163 cgcttctatggcgctgagattgtgtcagccctggactacctgcactcggagaagaacgtggtgtaccggga 

AKT2 NMJD01626 tcctgccacccttcaaacctcaggtcacgtccgaggtcgacacaaggtacttcgatgatgaatttaccgcc 

apc nm 000038 ggacagcaggaatgtgtttctccatacaggtcacggggagccaatggttcagaaacaaatcgagtgggt 

areg nm_ooi6S7 tgtgagtgaaatgccttctagtagtgaaccgtcctcgggagccgactatgactactcagaagagtatgataacgaaccacaa 

B-acun NMJM1101 CAGCAGATGTGGATCAGCAAGCAGGAGTATGACGAGTCCGGCCCCTCCATCGTCCACCGCAAATGC' 

B-Catenin NMJXD1904 GGCTCTTGTGCGTACTGTCCTTCGGGCTGGTGACAGGGAAGACATCACTGAGCCTGCCATCTGTGCTCTTCGTCATCTGA 

BAO NM_032989 GGGTCAGGTGCCTCGAGATCGGGCTTGGGCCCAGAGCATGTTCCAGATCCCAGAGTTTGAGCCGAGTGAGCAG 

. BAG1 NMJ304323 CGTTGTCAGCACTTGGMTACAAGATGGTTGCCGGGTCATGTTAATTGGGAAAAAGAACAGTCCACAGGAAGAGGTTGAAC 

BBC3 NMJD1441 7 cctggagggtcctgtacaatctcatcatgggactcctgcccttacccaggggccacagagcccccgagatggagcccaattag 

Bd2 nm_ooo633 cagatggacctagtacccactgagatttccacgccgaaggac^gcgatgggaaaaatgcccttaaatcatagg 

CA9 NMJ301216 ATCCTAGCCCTGG I I I I I GGCCTCCI I I l i gctgtcaccagcgtcgcgttccttgtgcagatgagaaggcag 

CCNB1 NM 031966 ttcaggttgttgcaggagaccatgtacatgactgtctccattattgatcggttcatgcagaataattgtgtgcccaagaag^^ 

CCND1 NM.001758 gcatgttcgtggcctctaagatgaaggagaccatccccctgacggccgagaagctgtgcatctacaccg 

CCNE1 NMJXM238 aaagaagatgatgaccgggtttacccaaactcaacgtgcaagcctcggattattgcac catcca gaggctc 

CCNE2 NMJD57749 atgctgtggctccttcctaActggggctttcttgacatgtaggttgcttggtaataacc I I I I igtatatcacaatttgggt 

C03z NM_000734 agatgaagtggaaggcgcttttcaccgcggccatcctgcaggcacagttgccgattacagaggca 

cose nm_ooi 251 tggttcccagccctgtgtccacctccaagcccagattcagattcgagtcatgtacacaacccagggtggaggag 

CD9 NM_001769 GGGCGTGGAACAGTTTATCTCAGACATCTGCCCCAAGAAGGACGTACTCGAAACCTTCACCGTG 

CDH1 NMJ304360 TGAGTGTCCCCCGGTATCTTCCCCGCCCTGCCAATCCCGATGAAATTGGAAATTTTATTGATGAAAATCTGAAAGCGGCTG 

CEGP1 NMJ320974 tgacaatcagcacacctgcattcaccgctcggaagagggcctgagctgcatgaa taag gatcacggctgtagtcaca 

Chk1 NMJXK274 gataaattggtacaagggatcagcttttcccagcccacatgtcctgatcatatgcttttgaatagtca gtta cttgg 

CIAP1 NM 001166 tgcctgtggtgggaagctcagtaactgggaaccaaaggatgatgctatgtcagaacaccggaggcattttcc 

ciAP2 NM 001165 ggatatttccgtggctcttattcaaactctccatcaaatcctgtaaactccagagcaaatcaagai I I I ictgccttgatgagaag 

cMet NMJ300245 gacatttccagtcctgcagtcaatgcctctctgccccaccctttgttcagtgtggctggtgccacgacaaatgtgtgcgatcggag 

Contig 27aAKobo6i8 . ggcatcctggcccaaagtttcccaaatccaggcggctagaggcccactgcttcccaactaccagctgagggggtc 

cox2 nm 000963 tctgcagagttggaagcactctatggtgacatcgatgctgtggagctgtatcctgcccttctggtagaaaagcctcggc 

ctsl nm"ooi912 gggaggcttatctcactgagtgagcagaatctggtagactgctctgggcctcaaggcaatgaaggctgcaatgg 

CTSL2 NM~001333 tgtctcactgagcgagcagaatctggtggactgttcgcgtcctcaaggcaatcagggctgcaatggt 

dapki nm_oo4938 cgctgacatcatgaatgttcctcgaccggctggaggcgagtttggatatgacaaagacacatcgttgctgaaagaga 

diablo " nm_019887 cacaatggcggctctgaagagttggctgtcgcggagcgtaacttcattcttcaggtacagacagtgtttgtgt 

0R5 NMJX33842 ctctgagacagtgcttcgatgactttgcagacttggtgccctttgactcctgggagccgctcatgaggaagttgggcctcatgg 

EGFR . NMJ305228 tgtcgatggacttccagaaccacctgggcagctgccaaaagtgtgatccaagctgtcccaat 

EIF4E nm"*ooi968 gatctaagatggcgactgtcgaaccggaaaccacccctactcctaatcccccgactacagaagaggagaaaacggaatctaa 

EMS1 NM~005231 ggcagtgtcactgagtccttgaaatcctcccctgccccgcgggtctctggattgggacgcacagtgca 

EpCAM NM~002354 GGGCCCTCCAGAACAATGATGGGCTTTATGATCCTGACTGCGATGAGAGCGGGCTCTTTAAGGCCAAGCAGTGCA 

EPHX1 NM~*000120 accgtaggctctgctctgaatgactctcctgtgggtctggctgcctatattctagagaagttttccacctggacca 

EfbB3 nm~ooi982 cggttatgtcatgccagatacacacctcaaaggtactccctcctcccgggaaggcaccctttcttcagtgggtctcagttc 

EstR1 NM~000125 cgtggtgcccctctatgacctgctgctggagatgctggacgcccaccgcctacatgcgcccactagcc 

FBX05 nm"oi 21 77 ggctattcctcattttctctacaaagtggcctcagtgaacatgaagaaggtagcctc^ 

FGF18 . NM*003862 CGGTAGTCAAGTCCGGATCAAGGGCAAGGAGACGGAATTCTACCTGTGCATGAACCGCAAAGGCAAGC 

FGFR1 NMJ323109 CACGGGACATTCACCACATCGACTACTATAAAAAGACAACCAACGGCCGACTGCCTGTGAAGTGGATGGCACCC 

FHIT NMJJ02012 CCAGTGGAGCGCTTCCATGACCTGCGTCCTGATGAAGTGGCCGATTTGTTTCAGACGACCCAGAGAG 

• FRP1 NM J30301 2 TTGGTACCTGTGGGTTAGCATCAAGTTCTCCCCAGGGTAGAATTCAATCAGAGCTCCAGTTTGC ATTTGGATGTG 
G-Catenin NM_002230 TCAGCAGCAAGGGCATCATGGAGGAGGATGAGGCCTGCGGGCGCCAGTACACGCTCAAGAAAACCACC 

GAPDH NM 002046 ATTC C ACC CATG GC AAATTC C ATG GC ACC GTC AAGGCTGAGAACGGGAAGCTTGTC ATCAATGG AAATCC CATC 

GATA3 NMJX)205l caaaggagctcactgtggtgtctgtgttccaaccactgaatctggaccccatctgtgaataagccattctgactc 

grb7 nm__qo53io ccatctgcatccatcttgtttgggctccccacccttgagaagtgcctcagataataccctggtggcc 

groi nm_ooisi i cgaaaagatgctgaacagtgacaaatccaactgaccagaagggaggaggaagctcactggtggctgttcctga 

gstmi nm 000561 aagctatg^ggaaaagaagtacacgatgggggacgctcctgattatgacagaagccagtggctgaatgaaaaattcaagctgggcc 

gus nm 000181 cccactcagtagccaagtcacaatgtttggaaaacagcccgtttacttgagcaagactgataccacctgcgtg 

her2 nm 004448 cggtgtgagaagtgcagcaagccctgtgcccgagtgtgctatggtctgggcatggagcacttgcgagagg 

hif 1 a nm_ooi 530 tgaacataaagtctgc aacatggaaggtattgcactgc acaggccacattc acgtatatgataccaacagtaaccaacctca 

HNF3A NM_004496 TCCAGGATGTTAGGAACTGTGAAGATGGAAGGGCATGAAACCAGCGACTGGAACAGCTACTACGCAGACACGC 

101 NM 002165 AGAACCGCAAGGTGAGCAAGGTGGAGATTCTCCAGCACGTCATCGACTACATCAGGGACCTTCAGTTGGA 

1GF1 NM~000618 TCCGGAGCTGTGATCTAAGGAGGCTGGAGATGTATT GCGCA CCCCTCAAGCCTGCCAAGTCAGCTCGCTCTGTCCG 

IGF1R NM~000875 GCATGGTAGCCGAAGATTrCACAGTCAAAATCGGAGATTTTGGTATGACGCGAGATATCTATGAGACAGACTATTACCGGAAA 

IGFBP2 NM 000597 GTGGACAGCACCATGAACATGTTGGGCGGGGGAGGCAGTGCTGGCCGGAAGCCCCTCAAGTCGGGTATGAAGG 

IL6 NM_000600 CCTGAACCTTCCAAAGATGGCTGAAAAAGATGGATGCTTCCAATCTGGATTCAATGAGGAGACTTGCCTGGT 

[RS1 NMJX35544 C C AC AGCTC AC CTTCTGTCAG GTGTC C ATCCC AGCTC C AGCCAGCTC C CAGAGAGGAAGAGACTG GC ACTG AGG 

W-67 NM 002417 CGGACTTTGGGTGCG^CTTGACGAGCGGTGGTTCGACAAGTGGCCTTGCGGGCCGGATCGTCCCAGTGGAAGAGTTGTAA 

KLK1 0 NMJ»2776 GCCCAGAGGCTCCATCGTCCATCCTCTTCCTCCCCAGTCGGCTGAACTCTCCCCTTGTCTGCACTGTTCAAACCTCTG 

KRT14 NM 000526 GGCCTGCTGAGATCAAAGACTACAGTCCCTACTTCAAGACCATTGAGGACCTGAGGAACAAGATTCTCACAGCCACAGTGGAC 

KRT1 7 NM~000422 C G AGGATTGGTTCTTC AG C AAG AC AG AGGAACTGAAC C GC GAG GTGG C C AC CAAC AGTG AGC TGGTGCAGAGT 

KRT18 NMJMX5224 agagatcgaggctctcaaggaggagctgctcttcatgaagaagaaccacgaagaggaagtaaaaggcc 

krti 9 nm_oo2276 tgagcggcagaatcaggagtaccagcggctcatggacatcaagtcgcggctggagcaggagattgccacctaccgca 

krts nm_ooo424 tcagtggagaaggagttggaccagtcaacatctctgttgtcacaagcagtgtttcctctggatatggca 

krt8 nm_oo2273 ggatgaagcttacatgaacaaggtagagctggagtctcgcctggaagggctgaccgacgagatcaacttcctcaggcagctatatg 

loti varii nm**oo2656 ggaaagaccacctgaaaaaccacctccagacccacgaccccaacaaaatggcctttgggtgtgaggagtgtgggaagaagtac 

Maspin nm!oo2639 cagatggccactttgagaacattttagctgacaacagtgtgaacgaccagaccaaaatccttgtggttaatgctg 

MCM2 NM_004526 gacttttgcccgctacctttcattccggcgtgacaacaatgagctgttgctcttcatactgaagcagttagtggc 

MCM3 NM 002388 GGAGAACAATCCCCTTGAGACAGAATATGGCCTTTCTGTCTACAAGGATCACCAGACCATCACCATCCAGGAGAT 

mcm6 NMJM591 s tgatggtcctatgtgtcacattcatcacaggtttcataccaaca^ggcttcagcacttcctttggtgtgtttcctgtccca 

M0M2 NM JD02392 CTACAGGGACGCCATC GAATCCGGATCTTGATGCTGGTGTAAGTGAACATTCAGGTGATTGGTTGGAT 

MMP9 NM~004994 GAGAACCAATCTCACCGACAGGCAGCTGGCAGAGGAATACCTGTACCGCTATGGTTACACTCGGGTG 

MTA1 NMJD04689 CCGCCCTCACCTGAAGAGAAACGCGCTCCTTGGCGGACACTGGGGGAGGAGAGGAAGAAGCGCGGCTAACTTATTCC 

MYBL2 NM~002466 GCCGAGATCGCCAAGATGTTGCCAGGGAGGACAGACAATGCTGTGAAGAATCACTGGAACTCTACCATCAAAAG 

P14ARF S78*53S CCCTCGTGCTGATGCTACTGAGGAGCCAGCGTCTAGGGCAGCAGCCGCTTCCTAGAAGACCAGGTCATGATG 

p27 NM_004064 CGGTGGACCACGAAGAGTTAACCCGGGACTTGGAGAAGCACTGCAGAGACATGGAAGAGGCGAGCC 

P53 NM 000546 CTTTGAACCC TTG C TTGC AATAGGTGTGC GTC AGAAGC AC CCAGGACTTCCATTTGCTTTGTC C C GGG 

PAI1 NM 000602 CCGCAACGTGGTTTTCTCACCCTATGGGGTGGCCTCGGTGTTGGCCATGCTCCAGCTGACAACAGGAGGAGAAACCCAGCA 

POGFRb NM""002609 CCAGCTCTCCTTCCAGCTACAGATCAATGTCCCTGTCCGAGTGCTGGAGCTAAGTGAGAGCCACCC 

PI3KC2A NM.002645 ATACCAATCACCGCACAAACCCAGGCTATTTGTTAAGTCCAGTC>CAGCGCAAAGAAACATATGCGGAGAAAATGCTAGTGTG 

PPM10 NMJ3O3620 GCCATCCGCAAAGGCTTTCTCGCTTGTCACCTTGCCATGTGGAAGAAACTGGCGGAATGGCC 

PR NMJD00926 gcatcaggctgtcattatggtgtccttacctgtgggagctgtaaggtcttctttaagagggcaatggaagggcagcacaactact 

prame nm_oo6i 15 tctccatatctgccttgcagagtctcctgcagcacctcatcgggctgagcaatctgacccacgtgc 

pS2 NM.003225 gccctcccagtgtgcaaataagggctgctgtttcgacgacaccgttcgtggggtcccctggtgcttctatcctaataccatcgacg 

RA051 C NM_058216 gaacttcttgagcaggagcatacccagggcttcataatcaccttctgttcagcactagatgatattcttgggggtgga 

rbi nm_ooo32i cg^gcccttacaagtttcctagttcacccttacggattcctggagggaacatctatatttcacccctgaagagtc 

rizi nm_oi 2231 ccagacgagcgattagaagcggcagcttgtgaggtgaatgatttgggggaagaggaggaggaggaagaggagga 

stki s nm_oo36oo catcttccaggaggaccactctctgtggcaccctggactacctgccccctgaaatgattgaaggtcgga 

STMY3 NM.005940 cctggaggctgcaacatacctcaatcctgtcccaggccggatcctcctgaagcccttttcgcagcactgctatcctccaaagccattgta 
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Table 5B 



SURV NMJJ01 168 TGTTTTGATTCCCGGGCTTACCAGGTGAGAAGTGAGGGAGGA 

TBP NM_0O31 94 G C C C GAAAC GC C GAATATAATCC CAAGC GGTTTGCTGC G GTAATCATGAGGATAAGAGAGC C AC G 

TGFA NMJX33236 GGTGTGCCACAGACCTTCCTACTTGGCCTGTAATCACCTGTGCAGCCTTTTGTGGGCCTTCAAAACTCTGTCAAGAACTCCGT 

TIMP 1 NM_003254 TCCCTGCGGTCCCAGATAGCCTGAATCCTGCCCGGAGTGGAACTGAAGCCTGCACAGTGTCCACCCTGTTCCCAC 

T0P2A NMJXM067 AATCCAAGGGGGAGAGTGATGACTTCCATATGGACTTTGACTCAGCTGTGGCTCCTCGGGCAAAATCTGTAC 

T0P2B NM_001068 TGTGGACATCTTC C C C TC AGAC TTC C C TAC TGAGC C ACCTTC TCTGCC AC GAAC C GGTC GGGCTAG 

TP NM_001 953 CTATATGCAGCC AGAGATGTGACAGCCACCGTGGACAGCCTGCCACTCATCACAGCCTCCATTCTCAGTAAGAAACTCGTGG 

TP53BP2 NM_005426 gggccaaatattcagaagcttttatatcagaggaccaccatagcggccatggagaccatctctgtcccatcatacccatcc 

trail nm_oo38i o cttcacagtgctcctgcagtctctctgtgtggctgtaacttacgtgtactttaccaacgagctgaagcagatg 

ts nm_ooio7i gcctcggtgtgcctttcaacatcgccagctacgccctgctcacgtacatgattgcgcacatcacg 

upa NM_002653 GTGGATGTGCCCTGAAGGACAAGCCAGGCGTCTACACGAGAGTCTCACACTTCTTACCCTGGATCCGCAG 

VOR NM_O0O376 gccctggatttcagaaagagccaagtctggatctgggaccctttccttccttccctggcttgtaact 

vegf nm_oo3376 ctgctgtcttgggtgcattggagccttgccttgctgctctacctccaccatgccaagtggtcccaggctgc 

vegfb nm_oo3377 tgacgatggcctggagtgtgtgcccactgggcagcaccaagtccggatgcagatcctcatgatccggtacc 

wispi nm_oo3882 agaggcatccatgaacttcacacttgcgggctgcatcagcacacgctcctatcaacccaagtactgtggagtttg . 

xiap nm_ooi 167 gcagttggaagacacaggaaagtatccccaaattgcagatttatcaacggcttttatcttgaaaatagtgccacgca 

yb-1 nm_004S59 agactgtggagtttgatgttgttgaaggagaaaagggtgcggaggcagcaaatgttacaggtcctggtggtgttcc 

2NF21 7 nm_oo6526 acccagtagcaaggagaagcccactcactgctccgagtgcggcaaagctttcagaacctaccaccagctg 
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Table 6A 



Gene 


Accession 


Probe Name 


Seq Len 




AIB1 


. NM_ 


006534 


S1994/AlB1.f3 


GCGGCGAGTTTCCGATTTA 


19 


AIB1 


NM_ 


006534 


S1995/A!B1.r3 


TGAGTCCACCATCCAGCAAGT 


21 


AIB1 


NM_ 


006534 


S5055/AIB1.p3 


ATGGCGGCGGGAGGATCAAAA 


21 


AKT1 


NM, 


005163 


S0010/AKT1.f3 


CGCTTCTATGGCGCTGAGAT 


20 


AKT1 


Nl\/f 


005163 


S0012/AKT1.r3 


TCCCGGTACACCACGTTCTT 


20 


AKT1 


NM_ 


005163 


S4776/AKT1 .p3 


CAGCCCTGGACTACCTGCACTCGG 


24 


AKT2 


NM_ 


001626 


S0828/AKT2.f3 


TCCTGCCACCCTTCAAACC 


19 


AKT2 


NM_ 


001626 


S0829/AKT2.r3 


GGCGGTAAATTCATCATCGAA 


21 


AKT2 


NM_ 


001626 


S4727/AKT2.p3 


CAGGTCACGTCCGAGGTCGACACA . 


24 


APC 


NM, 


000038 


S0022/APC.f4 


GGACAGCAGGAATGTGTTTC 


20 


APC 


nm~ 


000038 


S0024/APC.r4 


ACCCACTCGATTTGTTTCTG 


20 


APC 




.000038 


S4888/APC.p4 


CATTGGCTCCCCGTGACCTGTA 


22 


AREG 


nm~ 


.001657 


S0025/AREG.f2 


TGTGAGTGAAATGCCTTCTAGTAGTGA 


27 


AREG 


NM_ 


.001657 


S0027/AREG.r2 


TTGTGGTTCGTTATCATACTCTTCTGA 


27 


AREG 


NM_ 


.001657 


S4889/AREG.p2 


CCGTCCTCGGGAGCCGACTATGA 


23 


B-actin 


NM, 


.001101 


S0034/B-acti.f2 


CAGCAGATGTGGATCAGCAAG 


21 


B-actin 


. NM_ 


.001101 


S0036/B-acti.r2 


GCATTTGCGGTGGACGAT 


18 


B-actin 


NM. 


JD01101 


S4730/B-acti.p2 


AGGAGTATGACGAGTCCGGCCCC 


23 


B-Catenin 


NM_ 


.001904 


S2150/B-Cate.f3 


GGCTCTTGTGCGTACTGTCCTT 


22 


B-Catenin 


NM_ 


,001904 


S2151/B-Cate.r3 


TCAGATGACGAAGAGCACAGATG 


23 


B-Catenin 


NM_ 


,001904 


S5046/B-Cate.p3 


AGGCTCAGTGATGTCTTCCCTGTCACCAG 


29 


BAD 


NM_ 


_032989 


S2011/BAD.f1 


G G GTC AG GTG C CTC GAG AT 


19 


BAD 


NM_ 


_032989 


S20|2/BAD.r1 


CTGCTCACTCGGCTCAAACTC . 


21 


BAD 


NM. 


,032989 


S5058/BAD.p1 


TGGGCCCAGAGCATGTTCCAGATC 


24 


BAG1 


■. NM. 


_004323 


S1386/BAG1.f2 


CGTTGTCAGCACTTGGAATACAA 


23 


BAG1 


NM. 


,004323 


S1387/BAG1.r2 


GTTCAACCTCTTCCTGTGGACTGT 


24 


BAG1 


. NM] 


,004323 


S4731/BAG1.p2 


CCCAATTAACATGACCCGGCAACCAT 


26 


BBC3 


NM. 


..014417 


S1584/BBC3.f2 


CCTGGAGGGTCCTGTACAAT 


20 


BBC3 


NM. 


,014417 


S1585/BBC3.r2 


CTAATTG G G CTC C ATCTC G 


19 


BBC3 


NM. 


,014417 


S4890/BBC3.p2 


CATCATGGGACTCCTGCCCTTACC 


24 


Bc!2 


NM. 


_000633 


S0043/Bcl2.f2 


CAGATGGACCTAGTACCCACTGAGA 


25 


Bcl2 


NM. 


,000633 


S0045/Bcl2.r2 


CCTATGATTTAAGGGCA I I I I I CC 


24 


Bcl2 


NM. 


.000633 


S4732/Bcl2.p2 . 


TTCCACGCCGAAGGACAGCGAT 


22 


CA9 


NM. 


,001216 


S1398/CA9.f3 


ATCCTAGCCCTGG I I I I I GG 


20 


CA9 


NM 


.001216 


S1399/CA9.r3 


CTGCCTTCTCATCTGCACAA 


20 


CA9 


NM. 


,001216 


S4938/CA9.p3 


TTTG CTGTC AC C AG C GTC G C 


20 


CCNB1 


NM. 


.031966 


S1720/CCNB1.f2 


TTCAGGTTGTTGCAGGAGAC 


20 


CCNB1 


NM. 


.031966 


S1721/CCNB1.r2 


C ATCTTCTTG G G C AC AC AAT 


20 


CCNB1 


NM. 


_031966 


S4733/CCNB1.p2 


TGTCTCC ATTATTG ATC G GTTC ATG C A 


27 


CCND1 


NM 


,001758 


S0058/CCND1.f3 


GCATGTTCGTGGCCTCTAAGA 


. 21 


CCND1 


NM. 


,001 758 


S0060/CCND1,r3 


C G GTGTAG ATGC AC AG CTTCTC 


22 


CCND1 


NM. 


,001758 


S4986/CCND1.p3 


AAGGAGACCATCCCCCTGACGGC 


23 


CCNE1 


NM. 


,001238 


S1446/CCNE1.f1 


AAAG AAG ATG ATG AC C G G GTTTAC 


24 


CCNE1 


NM. 


,001238 


S1447/CCNE1.r1 


GAGCCTCTGGATGGTGCAAT 


20 


CCNE1 


NM. 


,001238 


S4944/CCNE1.p1 


CAAACTCAACGTGCAAGCCTCGGA 


24 


CCNE2 


NM. 


,057749 


■ S1458/CCNE2.f2 


ATGCTGTGGCTCCTTCCTAACT 


22 


CCNE2 


NM* 


057749 


S1459/CCNE2.r2 


ACCCAAATTGTGATATACAAAAAGGTT 


27 


CCNE2 


NM. 


,057749 


S4945/CCNE2.p2 


TACCAAGCAACCTACATGTCAAGAAAGCCC 


30 


CD3z 


NM. 


,000734 


S0064/CD3z.f1 


AG ATG AAGTG G AAG G C G C TT 


20 


CD3z 


NM. 


,000734 


S0066/CD3z.r1 


TG C CTCTGTAATC G G C AACTG 


21 


CD3z 


NM. 


,000734 


S4988/CD3z.p1 


CACCGCGGCCATCCTGCA 


18 


CD68 


NM* 


,001251 


S0067/CD68.f2 


TGGTTCCCAGCCCTGTGT 


18 


CD68 


NM 


001251 


S0069/CD68.r2 


CTCCTCCACCCTGGGTTGT 


19 


CD68 


NM. 


,001251 


S4734/CD68.p2 


CTCCAAGCCCAGATTCAGATTCGAGTCA 


28 


CD9 


NM. 


,001769 


S0686/CD9.f1 


. GGGCGTGGAACAGTTTATCT 


20 


CD9 


NM. 


,001769 


S0687/CD9.r1 


CACGGTGAAGGTTTCGAGT 


19 


CD9 


nm" 


,001769 


S4792/CD9.p1 


AGACATCTGCCCCAAGAAGGACGT 


24 


CDH1 


NM. 


,004360 


S0073/CDH1.f3 


TGAGTGTCCCCCGGTATCTTC 


21 


CDH1 


NM. 


,004360 


S0075/CDH1.r3 


CAGCCGCTTTCAGATTTTCAT 


21 


CDH1 


NM 


,004360 


S4990/CDH1.p3 


TG C C AATC C C G ATG AAATTG G AAATTT 


27 


CEGP1 


NM~ 


,020974 


S1494/CEGP1.f2 


TGACAATCAGCACACCTGCAT 


21 
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CEGP1 


NM_020974 


S1495/CEGP1.r2 


TGTGACTACAGCCGTGATCCTTA 


23 


CEGP1 


NM_020974 


S4735/CEGP1.p2 


CAGGCCCTCTTCCGAGCGGT 


20 


Chk1 


NMJ301274 


S1422/Chk1.f2 


GATAAATTGGTACAAGGGATCAGCTT 


26 


Chk1 


NMJ)01274 


S1423/Chk1.r2 


GGGTGCCAAGTAACTGACTATTCA 


24 


Chk1 


NM_001274 


S4941/Chk1.p2 


CCAGCCCACATGTCCTGATCATATGC 


26 


CIAP1 


NMJD01166 


S0764/CIAP1.f2 


TGCCTGTGGTGGGAAGCT 


18 


CIAP1 


NM_001166 


S0765/CIAP1.r2 


GGAAAATGCCTCCGGTGTT 


19 


CIAP1 


NM_001166 


S4802/CIAP1.p2 


TGACATAGCATCATCCTTTGGTTCCCAGTT 


30 


CIAP2 


NM_001165 


S0076/cIAP2.f2 


GGATATTTCCGTGGCTCTTATTCA 


24 


CIAP2 


NMJD01165 


S0078/clAP2.r2 


CTTCTCATCAAGGCAGAAAAATCTT . 


25 


CIAP2 


nm_ooh65 


S4991/clAP2.p2 * 


TCTCCATCAAATCCTGTAAACTCCAGAGCA 


30 


cMet 


NMJD00245 


S0082/cMet.f2 


G AC ATTTCC AGTC CTG C AGTC A 


22 


cMet 


NMJD00245 


S0084/cMet.r2 


CTCCGATCGCACACATTTGT 


20 


cMet 


NM_000245 


S4993/cMet.p2 


TGCCTCTCTGCCCCACCCTTTGT 


23 


Contig 27882 


AK000618 


S2633/Contig.f3 


GGCATCCTGGCCCAAAGT 


18 


Contig 27882 


AK000618 


S2634/Contig.r3 


GACCCCCTCAGCTGGTAGTTG 


21 


Contig 27882 


AK000618 . 


S4977/Contig.p3 


CCCAAATCCAGGCGGCTAGAGGC 


23 


COX2 


- NM_000963 


S0088/COX2.f1 


TCTGCAGAGTTGGAAGCACTCTA 


23 


COX2 


NM_000963 


S0090/COX2.r1 


GCCGAGGC I I I I CTACCAGAA 


21 


COX2 


NMJD00963 


S4995/COX2.p1 


C AG G ATAC AG CTC C AC AG CATC G ATGTC 


. 28 


CTSL 


NM_001912 


S1303/CTSL.f2 


GGGAGGCTTATCTCACTGAGTGA . 


23 


CTSL 


NMJD01912 


S1304/CTSL.r2 


CCATTGCAGCCTTCATTGC 


19 


CTSL 


NM_001912 


S4899/CTSL.p2 


TTGAGGCCCAGAGCAGTCTACCAGATTCT 


29 


CTSL2 


NMJD01333 


S4354/CTSL2.f1 


TGTCTCACTGAGCGAGCAGAA 


21 


CTSL2 


NM_001333 


S4355/CTSL2.r1 


ACCATTGCAGCCCTGATTG 


19 


CTSL2 


NMJ301333 


S4356/CTSL2.p1 


CTTGAGGACGCGAACAGTCCACCA 


24 


DAPK1 


; NMJD04938 


S1768/DAPK1.f3. 


CGCTGACATCATGAATGTTCCT 


22 


DAPK1 


NM_004938 


S1769/DAPK1.r3 


TCTCTTTCAGCAACGATGTGTCTT 


24 


DAPK1 


NMJ304938 


S4927/DAPK1.p3 


TC AT ATC C AAACTC G C CTC C AG C C G 


25 


DIABLO 


NMJD19887 


S0808/D!ABLO.f1 


CACAATGGCGGCTCTGAAG 


19 


DIABLO 


NM_019887 


S0809/DIABLO.ri 


- ACACAAACACTGTCTGTACCTGAAGA 


26 


DIABLO 


NMJD19887 


S4813/DIABLO.p1 


AAGTTACGCTGCGCGACAGCCAA 


23 


DR5 


NMJD03842 


S2551/DR5.f2 


CTCTGAGACAGTGCTTCGATGACT 


24 


DR5 


NMJD03842 


S2552/DR5.r2 


CCATGAGGCCCAACTTCCT 


19 


DR5 


NM_003842 


S4979/DR5.p2 


CAGACTTGGTGCCCTTTGACTCC 


23 


EGFR 


NM_005228 


S0103/EGFR.f2 


TGTCGATGGACTTCCAGAAC 


20 


EGFR 


NM_005228 


S0105/EGFR.r2 


ATTGGGACAGCTTGGATCA 


19 


EGFR 


NM_005228 


S4999/EGFR.p2 


CACCTGGGCAGCTGCCAA 


18 


EIF4E 


NM_001968 


S0106/EIF4E.f1 


GATCTAAGATGGCGACTGTCGAA 


23 


EIF4E 


NM_001968 


S0108/EIF4E.r1 


TTAGATTCCG I I I ICTCCTCTTCTG 


25 


EIF4E 


NM_001968 


S5000/ElF4E.p1 


ACCACCCCTACTCCTAATCCCCCGACT 


27 


EMS1 


NM_005231 


S2663/EMS1.f1 


GGCAGTGTCACTGAGTCCTTGA 


22 


EMS1 


NM_005231 


S2664/EMSi:r1 


TGCACTGTGCGTCCCAAT 


18 


EMS1 


NM_005231 


S4956/EMS1.p1 


ATCCTCCCCTGCCCCGCG 


18 


EpCAM 


NM_002354 


S1807/EpCAM.f1 


GGGCCCTCCAGAACAATGAT 


20 


EpCAM 


NM_002354 


S1808/EpCAM.r1 


TG C ACTG CTTG G C CTTAAAG A 


21 


EpCAM 


NM_002354 


S4984/EpCAM.p1 


C C G CTCTC ATCGC AGTC AG G ATC AT 


25 


EPHX1 


NM_000120 


S1865/EPHX1.f2 


ACCGTAGGCTCTGCTCTGAA 


20 


EPHX1 


NM_000120 


S1866/EPHX1.r2 


TG GTCC AG GTG G AAAACTTC 


20 


EPHX1 


NMJD00120 


S4754/EPHX1.p2 


AGGCAGCCAGACCCACAGGA 


20 


ErbB3 


NMJD01982 


S0112/ErbB3.f1 


CGGTTATGTCATGCCAGATACAC 


23 


ErbB3 


NM_001982 


S0114/ErbB3.r1 


GAACTGAGACCCACTGAAGAAAGG 


. 24 


ErbB3 


NM_001982 


S5002/ErbB3.p1 


C CTC AAAG GTACTC C CTC CTC C C G G 


25 


EstR1 


NMJD00125 


S0115/EstR1.f1 


CGTGGTGCCCCTCTATGAC 


19 


EstR1 


NMJD00125 


S0117/EstR1.r1 


GGCTAGTGGGCGCATGTAG 


19 


EstR1 


NM_000125 


S4737/EstR1.p1 


CTGGAGATGCTGGACGCCC 


19 


FBX05 


NM_012177 


S2017/FBXO5.M 


GGATTGTAGACTGTCACCGAAATTC * 


25 


FBX05 


NMJD12177 


S2018/FBXO5.f1 


GGCTATTCCTCATTTTCTCTACAAAGTG 


28 


FBX05 


NM_012177 


S5061/FBXO5.p1 


CCTCCAGGAGGCTACCTTCTTCATGTTCAC 


30 


FGF18 


NMJD03862 


S1665/FGF18.f2 


CGGTAGTCAAGTCCGGATCAA 


21 


FGF18 


NMJD03862 


S1666/FGF18.r2 


GCTTGCCTTTGCGGTTCA 


18 


FGF18 


NMJD03862 


S4914/FGFl8.p2 


CAAGGAGACGGAATTCTACCTGTGC 


25 



-37- 



Table 6C 



FGFR1 


NM_023109 


S0818/FGFR1.f3 


CACGGGACATTCACCACATC 


20 


FGFR1 


NM_023109 


S0819/FGFR1.r3 


G G GTGC C ATC C ACTTC AC A 


19 


FGFR1 


NM_023109 


S4816/FGFR1.p3 


ATAAAAAGACAACCAACGGGCGACTGC 


27 


FHIT 


NMJD02012 


S2443/FH!T.f1 


CCAGTGGAGCGCTTCCAT 


18 


FHIT 


NMJ302012 


S2444/FHIT.r1 


CTCTCTGGGTCGTCTGAAACAA 


22 


FHIT 


NM_002012 


S2445/FHIT.p1 


TCG G C C ACTTC ATC AG G AC G C AG 


23 


FHIT 


NM 002012 


S4921/FHIT.p1 


TCGGCCACTTCATCAGGACGCAG 


23 


FRP1 


NMJD03012 


S1804/FRP1.f3 


TTGGTACCTGTGGGTTAGCA 


20 


FRP1 


NMJ303012 


S1805/FRP1.r3 


CACATCCAAATGCAAACTGG 


20 


FRP1 


NM_003012 


S4983/FRP1.p3 


TCCCCAGGGTAGAATTCAATCAGAGC 


26 


G-Catenin 


NMJ302230 


S2153/G-Cate.f1 


TCAGCAGCAAGGGCATCAT 


19 


G-Catenin 


NMJ302230 


S2154/G-Cate.r1 


GGTGGI I ( I CTTGAGCGTGTACT 


23 


G-Catenin 


NMJD02230 


S5044/G-Cate.p1 


CGCCCGCAGGCCTCATCCT 


19 


GAPDH 


NM_002046 


S0374/GAPDH.f1 


ATTCCACCCATGGCAAATTC 


20 


GAPDH 


NMJD02046 


S0375/GAPDH.M 


GATGGGATTTCCATTGATGACA 


22 


GAPDH 


NMJD02046 


S4738/GAPDH.p1 


C C GTTCTC AGC CTTG AC G GTG C 


22 


GATA3 


NM 002051 


S0127/GATA3.f3 


C AAAGG AGCTC ACTGTG GTGTCT 


23 


GATA3 


: NM_002051 


S0129/GATA3.r3 


GAGTCAGAATGGCTTATTCACAGATG 


26 


GATA3 


NM_002051 


S5005/GATA3.p3 


TGTTCCAACCACTGAATCTGGACC 


24 


GRB7 


NMJ305310 


S0130/GRB7.f2 


CCATCTGCATCCATCTTGTT 


20 


GRB7 


NMJD05310 


S0132/GRB7.r2 


GGCCACCAGGGTATTATCTG 


20 


GRB7 


NMJD05310 


S4726/GRB7.p2 


CTCCCCACCCTTGAGAAGTGCCT 


23 


GRQ1 


NMJD01511 


S0133/GRO1.f2 


CGAAAAGATGCTGAACAGTGACA 


23 


GR01 


NMJD01511 


S0135/GRO1.r2 


TCAGGAACAGCCACCAGTGA 


20 


GR01 


NMJD01511 


S5006/GRO1 .p2 


CTTCCTCCTCCCTTCTGGTCAGTTGGAT 


28 


GSTM1 > 


NMJD00561 


S2026/GSTM1.r1 


GGCCCAGCTTGAA I I I I I CA 


20 


GSTM1 


NMJD00561 


S2027/GSTM1.f1 


AAGCTATGAGGAAAAGAAGTACACGAT 


27 


GSTM1 


NMJD00561 


S4739/GSTM1.p1 


TCAGCCACTGGCTTCTGTCATAATCAGGAG 


30 


GUS 


. NMJ300181 


S0139/GUS.f1 


CCCACTCAGTAGCCAAGTCA 


20 


GUS 


NM_000181 


S0141/GUS.r1 


CACGCAGGTGGTATCAGTCT 


20 


GUS 


NMJD00181 


S4740/GUS.p1 


TCAAGTAAACGGGCTG I I I I CCAAACA 


27 


HER2 


NMJD04448 


S0142/HER2.f3 


CGGTGTGAGAAGTGCAGCAA 


20 


HER2 


NMJD04448 


S0144/HER2.r3 


CCTCTCGCAAGTGCTCCAT 


19 


HER2 


NM_004448 


S4729/HER2.p3 


. CCAGACCATAGCACACTCGGGCAC 


24 


HIF1A 


NM_001530 


S1207/HIF1A.f3 


TGAACATAAAGTCTGCAACATGGA 


24 


HIF1A 


NM_001530 


S1208/HIF1A.r3 


TG AG GTTG GTTACTGTTG GT ATC ATATA 


28 


HIF1A 


NM_001530 


S4753/HIF1A.p3 


TTG C ACTG C AC AG G C C AC ATTC AC 


24 


HNF3A 


NMJD04496 


S0148/HNF3A.f1 


TCCAGGATGTTAGGAACTGTGAAG 


24 


HNF3A 


NM_004496 


S0150/HNF3A.r1 


GCGTGTCTGCGTAGTAGCTGTT 


22 


HNF3A 


NM_004496 


S5008/HNF3A.p1 


AGTCGCTGGTTTCATGCCCTTCCA 


24 


ID1 


NM_002165 


S0820/ID1.f1 


AGAACCGCAAGGTGAGCAA 


19 


ID1 


NM_002165 


S0821/ID1.r1 


TC C AACTG AAG GTCC CTG ATG 


21 


ID1 


NM_002165 


S4832/ID1.p1 


TGGAGATTCTCCAGCACGTCATCGAC 


26 


IGF1 


NM_000618 


S0154/IGF1J2 


TCCGGAGCTGTGATCTAAGGA 


21 


IGF1 


NMJD00618 


S0156/IGF1.r2 


CGGACAGAGCGAGCTGACTT 


20 


IGF1 


NM_000618 


S5010/IGF1.p2 


TGTATTGCGCACCCCTCAAGCCTG 


24 


IGF1R 


NMJ300875 


S1249/IGF1R.f3 


GCATGGTAGCCGAAGATTTCA 


21 


IGF1R 


NM_000875 


S1250/IGF1R.r3 


TTTCCGGTAATAGTCTGTCTCATAGATATC 


30 


IGF1R 


NMJD00875 


S4895/lGF1R.p3 


CGCGTCATACCAAAATCTCCGA I IIIGA 


28 


IGFBP2 


NM_000597 


S1128/IGFBP2.H 


GTGGACAGCACCATGAACA 


19 


IGFBP2 


NM_000597 


S1129/IGFBP2.r1 


CCTTCATACCCGACTTGAGG 


20 


IGFBP2 


NMJD00597 


S4837/IGFBP2.p1 


CTTCCGGCCAGCACTGCCTC 


20 


IL6 


NM_000600 


S0760/1L6.f3 


CCTGAACCTTCCAAAGATGG 


20 


IL6 


NM_000600 


S0761/IL6.r3 


ACCAGGCAAGTCTCCTCATT 


20 


IL6 


NM_000600 


S4800/IL6.p3 


CCAGATTGGAAG CATC CATC I I I I ICA 


27 


IRS1 


NM_005544 


S1943/IRS1.f3 


CCACAGCTCACCTTCTGTCA 


20 


IRS1 


NM_005544 


S1944/IRS1.r3 


CCTCAGTGCCAGTCTCTTCC 


20 


IRS1 


NMJD05544 


S5050/IRS1.p3 


TCCATCCCAGCTCCAGCCAG 


20 


Ki-67 


NMJD02417 


S0436/Ki-67.f2 


CGGACTTTGGGTGCGACTT 


19 


Ki-67 


NMJ302417 


S0437/Ki-67.r2 


TTACAACTCTTCCACTGGGACGAT 


24 


Ki-67 


NM_002417 


S4741/Ki-67.p2 


CCACTTGTCGAACCACCGCTCGT 


23 


KLK10 


NM 002776 


S2624/KLK10.f3 


GCCCAGAGGCTCCATCGT 


18 
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w\ Kin 


mm nn?77fi 

INIVl UVfc f/O 


o^o^o/r\i_r\ iu.ro 


' C 1 I I r^AAPAf^TnPAr^ArA 
Vvnvnvjo I i 1 vjr\Avnu i ywAOnwn 


23 


x\ tfi n 


MM 00977R 

INIVI__UUt / # D 


^4Q7ft/KI XAD 


p ptpttp ptp r r p Afvrc c^c pto a 

WO Ivl 1 1 UOVjUnVJ | WOVJ W 1 VJ/A 


23 


I/DT1 A 


mm nnn^9R 

INIVI UUUjZO 


O IOjO/iNia I l*r.TI 


(^f^PPTr^PTf^AnATPAAAnAP 




Wr\ I 1 4 


mm nnn«?9fi 

IN IVI_UUUOZO 


oioo*r/i\r\i i*t.ri 




9fl 


I/DT1 A 


mm ooo*^9R 

IN IVI UUUDZO 


oouo//r\r\i i*+.pi 


Tf^TTPPTPAf^r^TPPTPAATnr^TPTTn 

lOll w w 1 w/AOO 1 ww | wr\M I OO 1 w 1 1 O 


2fi 


l/DTI 7 

l\K 1 1 / 


mm nooz99 

IN IVI_JJUU**ZZ 


oni79/k*RT17 f9 
ou I f Ajr\r\ I i f .r^ 


P A n ATTfif^TTPTTP A f*^ P A A 

wOMOOM 1 1 OO Mull wMVJw/AM 


21 
^ i 




KIM nOfi/199 
IN m_UUU4ZZ 


QH17A/WRT17 r9 


APTPTf^PAPPAf^PTPAPTf^TTf^ 
Mw 1 w 1 VjwMww/AOw 1 w/Aw 1 U 1 1 u 


29 




mm nnnz99 

IN IVI^UUU*f 


cenn/KRT17 n9 
OJU I 0/r\r\ I I / -P^ 


PAPPTP(^P<^nTTPAr^TTPPTPTnT 

W/Aw W 1 wwwOO 1 1 w/AVJ 1 1 WW 1 w 1 O 1 


24 


WRT1 ft 


mm nnnooA 

IN IVI UUU^Z*t 


<51 71 n/K*RTlft f9 
O I r I U/ r\r\ I lO.IA 


A^Ar;ATpr;An(^PTPTPAAif^r^ 

nunun i wO/avjo w i w i w/a/aoo 


20 


KRT1 ft 
l\r\ I I o 


mm nnn99A 

IN IVI_UUUZZ*» 


^1711/KRT1ft r9 
O I / I \tt\r\ I I O.IZ. 


ClCiCr MM APTTPPTPTTPf^ 

OOww 1 1 1 1 /AO 1 1 WW lull WO 


20 


l/DTI ft 

l\r\ I t o 


mm 000994 

INIVl UUU^Z*t 


o*17R9/KRTlft n 9 


Tr; mnrpTTPTTP ATr; a a a n p a ptp p 

1 OO 1 1 w 1 1 W 1 1 w/A 1 O/A/AOMO w/AO W 1 WW 


97 


l/DTI Q 


mm nn997R 

INIVI_UUZZ / D 


Q1 ^1 ^/k"RTl Q 
o I O I 0/r\r\ I ly.TO 


TrJAf^Pr^nPAnAATPAnr^Af^TA 

1 O/AO wOO W/AO/A/A 1 w/AO O /AO 1 /A 


91 




mm oo997fi 

IN IVI UUZZ 1 D 


.oioiD/r\r\i iy.ro 


tcz p f*^T a n tp; n p a atp t p 

1 O wOO 1 /AOO 1 OOW/A/A 1 w 1 W 


1Q 
i %j 


I/DT1 Q 


IN M UUZZ / D 


o4oDD/r\K I I y.po 


njn a f5 A P ATP A A d TP P PTf5 
w 1 wM 1 OOMw/A 1 w/A/AO J wO wOO w 1 O 


24 


l\r\ I O 


mm nnriAOA 

IN IVI__UUU4Z4 


OU I ( 0/i\r\ I D.TO 


tp a r^Tf^ n a a a a <^tt<^ a a 

1 w/AO 1 OO /AO/A/AO O/AO I 1 OO/A 


20 




mm noriAO/i 

IN M__ UUU4Z4 


OU I / / /rvK I O.iO 


Tf^PPATATPPAr^Ar^f^AAAPA 
1 O w w/A 1 r\ i w wMOMO O/A/A/A w/A 


20 


r\r\ J o 


nm 000494 

IN IVI__UUU*rZ*r 


OOU I 3/ Pvl\ I Q.pO 


PPAnTPAAPATPTPTnTTfiTPAPAACnPA 
W w/ao i onnwn i w i w i o i i o i unvnnvjvn 


28 


l/DTO 

f\r\ I o 


mm 009971 

. INIVl UU^A f O 


OOCftft/l^RTft 
OCJOO/(\r\ I O.IO 


n atp; a ac ptt ap ATf^ a ap a a a fiT An a 

O O/A 1 VJ/A/AO W 1 1 /A W/A 1 O /A/AW/A/A O VJ 1 /AO/A 


27 




mm 00997*^ 


Ot,ooiJ/r\r\ i o.i o 


PATATAnPTnPPTC5AnC5AA<^TTC5AT 

W/A 1 /A 1 /AOW 1 OWW 1 0/AOO/A/A\J 1 1 VJ/A 1 


25 


tfRTft 


mm 009971 

INIVl UUcc / O 


o*+y 04t/ r\r\ i o.po 


pnTpr^nTPAnpppTTPPArinp 

wO 1 WOO 1 W/AOWWW 1 1 ww/AOO W 


21 


lvj i i variant i 


mm nnofi^R 

IN IVI^UUZOOO 


^nfiQ9/i OT1 u & 


nr;AAAnAPPAPPT(^AAAAAPPA- 

O O /A/A/A vj /AW W/A W W 1 O /A/A/A/A/AW W/A 


22 


lo i i variant i 


mm oo9fi^fi 


OUDaO/Lul I V.l^ 


CVTAPTTPTTP P P AP APTP PTC ACA 

O 1 /A W 1 1 W .1 1 W W W/AW/^W 1 W W 1 W/^W/A 


24 


lo 1 i i vanaru i 


mm oo9r^r 

INIVl UUZDOD 


Q47Q^/I OT1 \/ n9 


APPPAP(^APPPPAAPAAAATn^P 

/Aw W W/A WO /AW W WW/A/AW /A/A/A/A 1 OO W 


23 


Maspin 


inivi uuzooy 


Qnft^R/Macnin f9 

ouoob/iviaspin. rz 


P A n ATH n P P A PTTTn A Pi A A P ATT 

W/AO /A 1 OOw W/AW 1 1 1 O/AO /A/A W/A 1 1 


23 


Maspin 


mm nno«iQ 
in M_uuzooy 


ouoo ( /iviaspin.rz 


r^r^PAnPATTAAPPAPAAn^ATT 

OO W/AO W/A 1 1 /A/Aw W/A W/A/AO O/A 1 1 


22 


Maspin 


inivi uuzooy 


o4ooo/iviaspin.pz 


Af^PT(^APAAPAf^TfVrf^AAPnAPPAnAPP 

. /AO W 1 O /AW /A/A W/AO 1 O 1 O/A/AW O/AWW/AO/AW W 


28 


• IV IV. /I O 


mm nn/coc 


o 1 bUZyML/MZ.TZ 


OMw MM O wL/OO w 1 MOO 1 M O 


21 


MP^MO 
MoMZ 


mm nnAtzia 
INM IJU4DZO 


C1 (>n , 7/N/ir^N/l9 r-9 


n P P A PT A A PTf; PTTP A ATf^ A Af^ A 

OOO/aO 1 /A/AO I OO 1 1 O/AO 1 /A 1 OM/AO/AO 


26 


mp*m9 

MOMZ 


mm nn/ROR 
IN M_ UU4DZO 


o4yuu/ML/iviz.pz 


APAr^PTPATTf^TTrVTPAPfSPPf^f^A 
MOMOO 1 O/A 1 1 O 1 1 O 1 O/AO OOOO O/A 


24 


MoMo 


mk/i op\9iftft 
INM_UUZooo 


O 1 OZ4/IVIL/IV10.TO • 


r;f;Ar^AAPAATPPPPTTf4Af5A 

O O/AO /A/AO /A/A 1 OOOO 1 1 O/A O/A 


20 


MPM1 

MoMo 


mm nn^Qflfl 
IN M lJUZooo 


o i ozo/rviL/ivio.ro 


ATPTPPTPif^ATf^rVrf^ATf^T 

/A l O 1 OO 1 OO/A 1 OO 1 O/A 1 OO 1 


20 
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PTTTf^AAPPPTT^PTTf^PAA 
O 1 M OMMOOO 1 1 OO 1 1 OOMM 


20 




mm nnn^4fi 


S0210/P53 r2 


CCCGGGACAAAGCAAATG 


18 


P53 


NM_000546 


S5065/P53.p2 


AAGTCCTGGGTGCTTCTGACGCACA 


25 


PAM 


NM_000602 


S0211/PAI1.f3 


CCGCAACGTGGTTTTCTCA 


19 


PA11 


NM_000602 


S0213/PAI1.r3 


TG CTG G GTTTCTC CTC CTG TT 


21 


PAI1 


NM_000602 


S5066/PAI1.p3 


CTCGGTGTTGGCCATGCTCCAG 


22 


PDGFRb 


NM_0.02609 


S1346/PDGFRb.f3 


CCAGCTCTCCTTCCAGCTAC 


20 


PDGFRb 


NM_002609 


S1347/PDGFRb.r3 


GGGTGGCTCTCACTTAGCTC 


20 


PDGFRb 


NM_002609 


S4931/PDGFRb.p3 


ATCAATGTCCCTGTCCGAGTGCTG 


24 
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Table 6E 



P13KC2A 


NMJD02645 


S2020/P!3KC2.r1 


CACACTAGCA I I I I CTCCGCATA 


23 


P13KC2A 


NM_002645 


S2021/PI3KC2.f1 


ATACCAATCACCGCACAAACC 


21 


PI3KC2A 


NM_002645 


S50627PI3KC2.p1 


TGCGCTGTGACTGGACTTAACAAATAGCCT 


30 


PPM1D 


NMJ)03620 


S3159/PPM1D.f1 


GCCATCCGCAAAGGCTTT 


18 


PPM1D 


NMJD03620 


S3160/PPM1D.M 


GGCCATTCCGCCAGTTTC 


18 


PPM1D 


NMJ303620 


S4856/PPM1D.p1 


TCGCTTGTCACCTTGCCATGTGG 


23 


PR 


NMJD00926 


S1336/PR.f6 


GCATCAGGCTGTCATTATGG 


20 


PR 


NMJD00926 


S1337/PR.r6 


AGTAGTTGTGCTGCCCTTCC 


20 


PR 


NMJD00926 


S4743/PR.p6 


TGTC CTTACCTGTG G G AG CTGTAAG GTC 


28 


PRAME 


NM_006115 


S1985/PRAME.f3 


TCTCCATATCTGCCTTGCAGAGT 


23 


PRAME 


NMJ306115 


S1986/PRAME.r3 


GCACGTGGGTCAGATTGCT 


19 


PRAME 


NMJ306115 


S4756/PRAME.p3 


TCCTGCAGCACCTCATCGGGCT 


22 


pS2 


NM_003225 


S0241/pS2.f2 


GCCCTGCCAGTGTGCAAAT 


19 


pS2 


" NM_003225 


S0243/pS2.r2 


CGTCGATGGTATTAGGATAGAAGCA 


25 


pS2 


NMJD03225 


S5026/pS2.p2 


TGCTGTTTCGACGACACCGTTCG 


23 


RAD51C 


NMJD58216 


S2606/RAD51C.f3 


GAACTTCTTGAGCAGGAGCATACC 


24 


RAD51C 


NMJ358216 


S2607/RAD51C.r3 


TC C AC C CC C AAG AATATC ATCTAGT 


25 


RAD51C 


. NMJ)58216 


S4764/RAD51C.p3 


AG G G CTTC AT AATC AC CTTCTGTTC 


25 


RB1 


NM_000321 


S2700/RB1.f1 


CGAAGCCCTTACAAGTTTCC 


20 


RB1 


NM_000321 


S2701/RB1.r1 


GGACTCTTCAGGGGTGAAAT 


20 


RB1 


NM_000321 


S4765/RB1.p1 


CCCTTACGGATTCCTGGAGGGAAC 


24 


RIZ1 


NM_012231 


S1320/RIZ1.f2 


CC AG AC G AG CG ATT AG AAG C 


20 


RIZ1 


NMJ)12231 


S1321/RIZ1.r2 


TCCTCCTCTTCCTCCTCCTC 


20 


RIZ1 


NMJD12231 


S4761/RIZ1.p2 


TGTGAGGTGAATGATTTGGGGGA 


23 


STK15 


NM 003600 

1 "t* % W W W 7 W w 


S0794/STK15.f2 


CATCTTCCAGGAGGACCACT 


20 


STK15 


NM_003600 


S0795/STK15.r2 


TCCGACCTTCAATCATTTCA 


20 


STK15 


NM 003600 


S4745/STK15.p2 


CTCTGTGGCACCCTGGACTACCTG 


24 


STMY3 


NM 005940 


S2067/STMY3.f3 


CCTGGAGGCTGCAACATACC 


20 


STMY3 


NM 005940 

1 WWWW «^W 


S2068/STMY3.r3 


TACAATGGCTTTGGAGGATAGCA 


23 


STMY3 


NM_005940 


S4746/STMY3.p3 


ATCCTCCTGAAGCCC I I I I CGCAGC 


25 


SURV 


NM 001168 


S0259/SURV.f2 


TG I I I I GATTCCCGGGCTTA 


20 


SURV 


NM 001168 


S0261/SURV.r2 


CAAAGCTGTCAGCTCTAGCAAAAG 


24 


SURV 


NMJ301168 


S4747/SURV.p2 


TGCCTTCTTCCTCCCTCACTTCTCACCT 


28 


TBP 


NM_003194 


S0262/TBP.f1 


GCCCGAAACGCCGAATATA 


19 


TBP 


• NM~003194 


S0264/TBP.r1 


C GTG GCTCTCTT ATC CTC ATG AT 


23 


TBP 


NM 003194 


S4751/TBP.p1 


TACCGCAGCAAACCGCTTGGG 


21 


TGFA 


NM 003236 


S0489/TGFA.f2 


GGTGTGCCACAGACCTTCCT 


20 


TGFA 


NM 003236 

M 1 V| www^ww 


S0490/TGFA.r2 


ACGGAGTTCTTGACAGAG I I I I GA 


24 


TGFA 


NM 003236 


S4768/TGFA.p2 


TTGGCCTGTAATCACCTGTGCAGCCTT 


27 


TIMP1 


NM 003254 


S1695/TIMP1.f3 


TCC CTG C G GTC C C AG ATAG 


19 


TIMP1 


NM 003254 


S1696/TIMP1.r3 


GTGGGAACAGGGTGGACACT 


20 


TIMP1 


NM 003254 


S4918/TIMP1.p3 


ATCCTGCCCGGAGTGGAACTGAAGC 


25 


TOP2A 


NM 001067 


S0271/TOP2A f4 


AATC C AAG G G G G AG AGTG AT 


20 


TOP2A 


NM 001067 

1 1 1 V l WW i UW / 


S0273/TOP2A.r4 


GTACAGAI I I I GCCCGAGGA 


20 


T0P2A 


NM 001067 


S4777/TOP2A.p4 


CATATGGACTTTGACTCAGCTGTGGC 


26 


TOP2R 


NM 001068 


S0274/TOP2B f2 


TGTGGACATCTTCCCCTCAGA 


21 


TOP2B 


NM 001068 


S0276/TOP2B.r2 


CTAG C C C G AC C G GTTC GT 


18 


TOP2B 


NM 001068 

J iIVI ww 1 Www. 


S4778/TOP2B.p2 


TTCCCTACTGAGCCACCTTCTCTG 


24 


TP 


NM 001953 

1 1 IVI ww 1 www* 


S0277/TP.f3 


CTATATGCAGCCAGAGATGTGACA 


24 


TP 


NM 001953 

1 w IVI w W 1 www 


S0279/TP.r3 


C C AC G AGTTTCTTACTG AG AATG G 


24 


TP 


NM 001953 


S4779/TP.p3 


ACAGCCTGCCACTCATCACAGCC 


23 


TP53BP2 


NM 005426 


S1931/TP53BP.f2 


GGGCCAAATATTCAGAAGC 


19 


TP53BP2 


NM 005426 


S1932/TP53BP.r2 


GGATGGGTATGATGGGACAG 


20 


TP53BP2 


NM_005426 


S5049/TP53BP.p2 


CCACCATAGCGGCCATGGAG 


20 


TRAIL 


NM_003810 


S2539/TRAlL.f1 


CTTCACAGTGCTCCTGCAGTCT 


22 


TRAIL 


NM_003810 


S2540/TRAIL.r1 


CATCTGCTTCAGCTCGTTGGT 


21 


TRAIL 


NM_003810 


S4980/TRAIL.p1 


AAGTACACGTAAGTTACAGCCACACA 


26 


TS 


NMJD01071 


S0280/TS.f1 


GCCTCGGTGTGCCTTTCA 


18 


TS 


NM_001071 


S0282/TS.r1 


CGTGATGTGCGC AATC ATG 


19 


TS 


NM_001071 


S4780/TS.p1 


CATCGCCAGCTACGCCCTGCTC 


22 


upa 


NM_002658 


S0283/upa.f3 


GTGGATGTGCCCTGAAGGA 


19 


upa 


NMJ302658 


S0285/upa.r3 


CTGCGGATCCAGGGTAAGAA 


20 
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Table 6F 



upa 


NM_ 


002658 


S4769/upa.p3 


AAG C C AG G C GTCTAC AC G AG AGTCTCAC 


28 


VDR 


NM_ 


000376 


S2745A/DR.f2 


GCCCTGGATTTCAGAAAGAG 


OA 

20 


VDR 


NM_ 


_000376 


S2746A/DR.r2 


AGTTACAAGCCAGGGAAGGA 


20 


VDR 


NM_ 


,000376 


S4962/VDR.p2 


CAAGTCTGGATCTGGGACCCTTTCC 


25 


VEGF 


NM_ 


,003376 


S0286A/EGF.f1 


CTG CTGTCTTG G GTG C ATTG 


20 


VEGF 


NM_ 


.003376 


S0288A/EGF.r1 


GCAGCCTGGGACCACTTG 


18 


VEGF 


NM_ 


,003376 


S4782/VEGF.p1 


TTGCCTTGCTGCTCTACCTCCACCA 


25 


VEGFB 


NM_ 


.003377 


S2724A/EGFB.f1 


TG AC G ATG G C CTGG AGTGT 


19 


VEGFB 


NM. 


.003377 


S2725A/EGFB.r1 


GGTACCGGATCATGAGGATCTG 


22 


VEGFB 


NM_ 


.003377 


S4960A/EGFB.p1 


CTGGGCAGCACCAAGTCCGGA 


21 


WISP1 


NM_ 


.003882 


S1671/WISP1.f1 


AG AG G C ATC C ATG AACTTC AC A 


22 


WISP1 


NM_ 


_003882 


S1672/WlSP1.r1 


CAAACTCCACAGTACTTGGGTTGA 


24 


WISP1 


- NM_ 


~003882 


S4915/WISP1.p1 


CGGGCTGCATCAGCACACGC 


20 


VIAD 

AlAr 


HI h A 

INM_ 






pp a pttpp a a p a p a p a pp a a a pt 




XIAP 


nm] 


~001167 


S0291/XlAP.r1 


TG C GTG GC ACTA 1 1 I I CAAGA 


21 


XIAP 


NM. 


_001167 


S4752/XIAP.p1 


TCCCCAAATTGCAGATTTATCAACGGC 


27 


YB-1 


NM_ 


J504559 


S1194/YB-1.f2 . 


AGACTGTGGAGTTTGATGTTGTTGA 


25 


YB-1 


NM_ 


J304559 


S1195/YB-1.r2 


G G AAC AC C ACC AG G AC CTGTAA 


22 


YB-1 


NM^ 


J304559 


S4843/YB-1 .p2 


TTGCTGCCTCCGCACCC 1 1 1 1 CT 


23 


2NF217 


NM. 


_006526 


S2739/ZNF217.f3 


ACCCAGTAGCAAGGAGAAGC 


20 


ZNF217 


NM 


.006526 


S2740/ZNF217.r3 


CAGCTGGTGGTAGGTTCTGA 


20 


ZNF217 


NM. 


.006526 


S4961/ZNF217.p3 


CACTCACTGCTCCGAGTGCGG 


21 
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